We are concerned
with the following problem: we wish to label some observed
pattern x with some class category
.
Two possible
situations with respect to x and
may occur:
1) We may have
complete statistical knowledge of the distribution of observation
x and category
. In this case, a standard Bayes analysis
yields an optimal decision procedure.
2) We may have no
knowledge of the distribution of observation x and
category
aside from that provided by pre-classified samples. In this
case, a decision to classify x into category
will depend
only on a collection of correctly classified samples.
The nearest neighbor rule is concerned with the latter case. Such problems are classified in the domain of non-parametric statistics. No optimal classification procedure exists with respect to all underlying statistics under such conditions.
Nearest
Neighbor Rule: A Short Tutorial
March, 1999