Intro
- definition: classification problem
- $(x,c) \in X \times C$, $X = R^d$, $C = Z_r$
- joint probability $P(x,c)$
- classifier
- training set $T = \\{(x_1,c_1),...,(x_N,c_N)\\}$
- test set
- error rate
- empirical error rate
Bayes
- Bayes formula
- Bayes risk
- zero-one loss function, posterior
Classifiers
- classifier: $c = C(X)$
- discriminant functions: $c = \arg\max_c g(x)$
- posterior probabilities: $c = \arg\max_c P(c|x)$
Nearest Neighbor Classifier
- $C_{NN}(x) = c_k$ where $k = \arg\min_i ||x_i-x||$ for $(x_i,c_i)$ in training set
- $C_{NN}$ has at most twice the Bayes error rate (why?)
- k-NN classifier: use a vote among the $k$ nearest neighbors
- remember $||x-y||^2 = x^2+y^2-2x\cdot y$