The Projection Method for Finding Nearest Neighbors

Pattern Recognition (308-644B)

School of Computer Science, McGill University, Winter 1999

Term Project prepared by Jean-Sébastien Perrier (perrier@ai.polymtl.ca)

and Benoit Anctil (benoit@maude.biomed.mcgill.ca)

Table of content :


Introduction

The essential purpose of a pattern recognition system consists of taking in raw data and yielding a category [1]. Even if this definition is very simple, the problem is usually more complex in real life. Once the raw data has been acquired, it is preprocessed and segmented to obtain the useful information. Then, the features or the properties of this pattern are evaluated by the feature extractor. Finally, these features are sent to a classifier, which determines the class membership.

If a training set of patterns with known class membership is available, we can use non-parametric decision rules to make decision on the membership of unknown patterns. One of these rules, called the k-nearest neighbors decision rule, classifies an unknown pattern to the class most heavily represented among its k-nearest neighbors (see figure below). This method is widely used because:

no prior knowledge of the distributions is required;

it is extremely simple to implement;

it gives a low probability of error.

On the other hand, this technique requires brute force computation of all distances in the entire data set and the storage of all these values into the computer's memory. To solve that problem, Friedman et al. [2] proposed a simple and efficient method for finding nearest neighbors which reduces significantly the amount of distance computation.

Figure 1. Decision based on the 3-nearest neighbors.

In this example, a decision based on the 3-nearest neighbors would classify the unknown pattern (green point) into class B.

How does it work?

The basic idea behind this method consists of using the projection of every point on an axis to limit the search for finding nearest neighbors. Let's look at the following example where we have a training set (prototype points) with known class membership and a test point to be classified:

Figure 2. Set of prototype points and a test point.

The following instructions describe the method suggested by Friedman to find the nearest neighbor of a test point in two dimensions:

Preprocessing
 

Step 0 Project all the points on one axis and sort these projection points. This step is done only once for the set of data. The sorting operation can be done in O(N log N) time using an efficient sorting algorithm (like "heapsort").

Finding Nearest Neighbor(s)
 

Step 1 Locate the test point's projection on this axis. Dichotomic search, complexity O(log N).
Step 2 Find the two closest projected points (one on each side) from the projected test point. Identified by A and B in figure 3.
Step 3 Measure the distance (in full dimensionality) between the two prototypes found in step 2 and the test point and identify which distance is the shortest (rd).

Figure 3. Illustration of steps 0 to 3.

Now we must ask ourselves if it's possible to have another point closer than the prototype A. The answer is yes, but we can also say that all the prototypes located at a distance greater than rd from the test point should not be considered. Why? Because any point closer than the prototype A can only be located within the circle defined by the radius rdand centered at the test point. This leads us to the next step:
 

Step 4 Determine the limits of the search on the coordinate axis : 
  • Boundary #1 = test point's projection + distance rd
  • Boundary #2 = test point's projection - distance rd

Figure 4. Illustration of step 4.


Step 5 Compute and save in memory the distances between the test point and the prototypes located between the two boundaries.
Step 6 Look for the shortest distance within this set. The prototype point with the shortest distance corresponds to the nearest neighbor.

Figure 5. Illustration of steps 5 and 6.

Until now, we have found only one nearest neighbor. If it is required to find the k-nearest neighbors, then we must add one more step to the algorithm:
 

Step 7 If k > 1, "erase" (or remove from the list) the nearest neighbor found in step 6 and repeat k times the steps 1 through 7.
Note: If k > 1, the two boundaries are reevaluated for each iteration. As a result, new prototype points may be added. However, because the distance values are recorded in step 5, only the distances corresponding to new prototype points need to be computed.

Expanded Method in D-dimensions

The previous example illustrates the method for a data set in two dimensions. We now present the general method for finding the k-nearest neighbors for training set in d-dimensions. The idea is to find the best projection axis that minimizes the number of distance computations. In other words, for a given test point, we want to find the axis with the smallest density of prototypes around the test point.

Figure 6. Example of how the selection of the axis with the smallest density can reduce the number of distance computations. (a) All the prototypes must be verified. (b) Only 4 prototypes need to be verified.

Here are the instructions to do this:

Preprocessing
 

Step 0.1 Project all the points on all coordinate axes and sort these projection points.
Step 0.2 Evaluate the expected number of distance computations ( n ) for a uniform distribution (worst case). The following table gives the expected number of distance computation ( n ) for 3 types of Minkowski metrics:
Metric p n
Manhattan 1
Euclidean 2
Maximum coordinate distance ¥

k = number of nearest neighbors, d = number of dimensions, N = number of prototypes
 
 

Finding Nearest Neighbor(s)
 

Step 1 Locate the test point's projection on each axis. Dichotomic search, complexity O(log N).
Step 2 Find the position of the (n/2)th prototype on each side of the test point (see Figure 7).
Step 3 Measure the distance between these two prototypes (S). See example (Figure 7) below for n = 10.
Step 4 Compute the local projected density (D) in the neighborhood of the test point.

D = n ¸ S

Step 5 Find the coordinate axis with the smallest local projected density (D). Use this coordinate axis for the rest of the search.

Figure 7. Illustration of the local projected density (step 0.1 to 3) for n = 10.

The following steps are the same as the first example:
 

Step 6 Find the two closest projected points (one on each side) from the projected test point.
Step 7 Measure the distance (in full dimensionality) between the two prototypes found in step 6 and the test point and identify which one is the shortest (rd).
Step 8 Determine the limits of the search: 
  • Boundary #1 = test point's projection + distance rd
  • Boundary #2 = test point's projection - distance rd
Step 9 Compute and save in memory the distances between the test point and the prototypes located between the two boundaries.
Step 10 Look for the shortest distance within this set. The prototype point with the shortest distance corresponds to the nearest neighbor.
Step 11 If k > 1, "erase" the nearest neighbor found in step 10 and repeat k times the steps 6 through 11.
Note: If k > 1, the two boundaries are reevaluated for each iteration. As a result, new prototype points may be added. However, because the distance values are recorded in step 9, only the distances corresponding to new prototype points need to be computed.

Performance

From simulation experiments with a training set of 1000 points, it was possible to quantify the improvement made with this method (n = number of distance computations):
 

k d n (projection) / n (brute force) n (projection) / n (brute force)

(%)

1 2 10/1000 1 %
1 8 100/1000 10 %
10 2 400/1000 40 %

Important things to know

A uniform distribution represents a worst case for the performance of the algorithm e.g. better results are obtained with a normal (Gaussian) distribution.

The number of test points ( Nt) required for the preprocessing procedure to be profitable (step 0.1 and 0.2 in the expanded method) can be evaluated with the following equation
 

Np is the number of prototypes.

  In the computation of the distances, the Minkowski metric for p = ¥ (maximum coordinate distance) should be considered first because it also minimizes the number of distance calculations:

References
 

[1] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification, John Wiley Interscience, 1999.
[2] J. H. Friedman, F. Baskett and L. J. Shustek. An Algorithm for Finding Nearest Neighbors. IEEE Trans. Comput.., vol. C-24, pp. 1000-1006, Oct. 1975.

Here is a link to an other method:

Fast Nearest Neighbor Search in High-dimensional Space