It's easiest to think of this problem geometrically. We have one interval 0 <=

Thus, we have two cases when given a vector

*Case 1:***x**lies in 0 <=**x**<= (CR/(C-1)).- Since all densities are equal, a random guess as to which class
**x**belongs is as good a guess as any. So our Bayes chance of erroris (C-1)/C.*given that***x***lies in this interval*

*Case 2:***x**does not lie in 0 <=**x**<= (CR/(C-1)).- Then
**x**lies in an interval where only one class has nonzero density, so the Bayes chance of error**here**is zero.

P

P

P

If you feel more comfortable with the algebra,

INTEGRAL

Since either p(

INTEGRAL

which equals [CR / (C-1)] * [(C-1) / C] = R by the same math as above.

Suppose we have infinite data. Again, we have two cases.

*Case 1:***x**lies in 0 <=**x**<= (CR/(C-1)).- Since all densities are equal, the nearest neighbour to
**x**will be equally likely to be in any class. So our 1-NN chance of erroris (C-1)/C.*given that***x***lies in this interval*

*Case 2:***x**does not lie in 0 <=**x**<= (CR/(C-1)).- Then
**x**lies in an interval where only one class has nonzero density. Since the intervals**i**+ 1 - (CR/C-1) do not touch (i.e., there is nonzero distance between them), the nearest neighbour of**x**will lie in this interval and therefore be of the same class. The 1-NN error**here**is zero.

Every class has density 1 in the interval 0 <=

P

P

P

I assume that I have infinite data. I also need the assumptions that

- The density functions for each class are continuous and independent.
- The distribution is nice, in the sense that my infinite data is well-distributed. In other words, it's not all concentrated at a single vector, for example. I can formalize this by stating the following.
- In any open set S where the density is nonzero throughout the entire set, I have infinite data.

Let

P

P

Since I chose

P

P

P

P

P

P

P

Now I must show that P

P

P

P

Suppose that P(

P

P

But we can show that P(

P(

P(

P(

P(

P(

Thus P

The two classes occur in unit hyperspheres ten units apart. Suppose we have m data of class A, and n data of class B. If I give you a new element that falls in the hypersphere of class A, then we know it's a class A instance because the space of A and the space of B are disjoint. So apply the nearest neighbour rule, as the question asks. The first m nearest neighbours (quite close, within unit distance) are of class A, and when those are exhausted, the next nearest data are of class B, ten units away. Thus error occurs under the k-NN rule only if m < k/2, and our query occurs in the set with m elements. What is the chance that, given equal a priori probabilities, that we picked such a distribution for the training set? It's like making n coin flips and having fewer than k/2 come up as heads -- and then being asked to classify a coin flip of heads.

It's exactly the equation in the problem.

Same reasoning; if k = 1, then m = 0. This is more unlikely than m < k/2 for k > 1.

I'll let you be the artists here . . .

It's the same rule. If I use the k-NN rule, then take a majority vote of the k nearest neighbours (k odd). Suppose class A wins the vote. Then certainly the (k + 1)/2 - th instance of A was closer than the (k + 1)/2 - th instance of B; otherwise there would have been more B's than A's among the k nearest neighbours! The converse logic is similar, so we have an if and only if.