Probability of Error

In this section, we give an overview of how the probability of error for the nearest neighbor rule may be bounded by the Bayes probability of error. We first provide a brief review of the Bayesian probability of error and then move on to discuss how the nearest neighbor rule relates to the latter. A more complete discussion of the proof may be obtained from [2].


Bayesian Probability of Error:

A Brief Review

The Bayesian selection criteria minimizes the probability of miss-classification. This is simply achieved by always classifying an object x into its most probable class.

Thus, for a set of c classes we could describe the maximum conditional probability as:

From this we denote the Bayesian conditional probability of miss-classification as:

And by averaging over the apriori distribution of x we get:


Nearest Neighbor Probability of Error:

The Random Variables

Let us begin by looking at all the random variables in the construction of an x,,, system.

We denote as the true class of x and as the labeled class of , where is the nearest neighbor of x.

It is clear that x and its are random input parameters to the problem. Noteworthy as well is that the underlying statistics of the labeled space are random too. Thus the , pair are also unkown and thus random inputs.

As a final point to this discussion, we should point out that the probability of x having true class and that of beign labeled are independent. Thus we can write the following expression:


Expressing the Probability of Error

We may denote the probability of the NN-Rule making a correct classification when and agree on a class type where 1<= i <="c" as :

Thus the conditional probabilty of error may be written as:


Convergence of Probablity of Error

Notice that as n approaches infinity the space of labeled items will become increasingly filled. Thus the nearest neighbor of x will become with probability 1.

So we can say that:


Bounds on the Conditional Probability of Error

From our above expression taking the limit on the conditional probability we get:

We can divide the sum up into two parts as follows:

Using the following three points

  • (Recall from our earlier review of Bayesian theory)

  • is minimized when all posteriori probabilities are equal.

  • we can write:

    And finally after some simplification,

    So we see that the upper bound on the probability of error for the NN-Rule is less than twice Baye's error rate.

    Thus we may bound the error rate of the NN-Rule as follows:

    where denotes the Bayes Error Rate.


    Nearest Neighbor Rule: A Short Tutorial
    March, 1999