The Projection Method for Finding Nearest Neighbors
Pattern Recognition (308-644B)
School of Computer Science, McGill University, Winter 1999
Term Project prepared by Jean-Sébastien Perrier (perrier@ai.polymtl.ca)
and Benoit Anctil (benoit@maude.biomed.mcgill.ca)
Table of content :
The essential purpose of a pattern recognition system consists of taking in raw data and yielding a category [1]. Even if this definition is very simple, the problem is usually more complex in real life. Once the raw data has been acquired, it is preprocessed and segmented to obtain the useful information. Then, the features or the properties of this pattern are evaluated by the feature extractor. Finally, these features are sent to a classifier, which determines the class membership.
If a training set of patterns with known class membership is available, we can use non-parametric decision rules to make decision on the membership of unknown patterns. One of these rules, called the k-nearest neighbors decision rule, classifies an unknown pattern to the class most heavily represented among its k-nearest neighbors (see figure below). This method is widely used because:
no prior knowledge of the distributions is
required;
it is extremely simple to implement;
it gives a low probability of error.
On the other hand, this technique requires brute force computation of all distances in the entire data set and the storage of all these values into the computer's memory. To solve that problem, Friedman et al. [2] proposed a simple and efficient method for finding nearest neighbors which reduces significantly the amount of distance computation.
Figure 1. Decision based on the 3-nearest neighbors.
In this example, a decision based on the 3-nearest neighbors would classify the unknown pattern (green point) into class B.
The basic idea behind this method consists of using the projection of every point on an axis to limit the search for finding nearest neighbors. Let's look at the following example where we have a training set (prototype points) with known class membership and a test point to be classified:
Figure 2. Set of
prototype points and a test point.
The following instructions describe the method suggested by Friedman to find the nearest neighbor of a test point in two dimensions:
Preprocessing
| Step 0 | Project all the points on one axis and sort these projection points. This step is done only once for the set of data. The sorting operation can be done in O(N log N) time using an efficient sorting algorithm (like "heapsort"). |
Finding Nearest Neighbor(s)
| Step 1 | Locate the test point's projection on this axis. Dichotomic search, complexity O(log N). |
| Step 2 | Find the two closest projected points (one on each side) from the projected test point. Identified by A and B in figure 3. |
| Step 3 | Measure the distance (in full dimensionality) between the two prototypes found in step 2 and the test point and identify which distance is the shortest (rd). |
Figure 3. Illustration of steps 0 to 3.
Now we must ask ourselves if it's
possible to have another point closer than the prototype A. The
answer is yes, but we can also say that all the prototypes
located at a distance greater than rd from the test point should not be considered.
Why? Because any point closer than the prototype A can
only be located within the circle defined by the radius rdand centered at the test point. This leads us to
the next step:
| Step 4 | Determine
the limits of the search on the coordinate axis :
|
Figure 4. Illustration of step 4.
| Step 5 | Compute and save in memory the distances between the test point and the prototypes located between the two boundaries. |
| Step 6 | Look for the shortest distance within this set. The prototype point with the shortest distance corresponds to the nearest neighbor. |
Figure 5. Illustration of steps 5 and 6.
Until now, we have found only one
nearest neighbor. If it is required to find the k-nearest
neighbors, then we must add one more step to the algorithm:
| Step 7 | If k > 1, "erase" (or remove from the list) the nearest neighbor found in step 6 and repeat k times the steps 1 through 7. |
| Note: | If k > 1, the two boundaries are reevaluated for each iteration. As a result, new prototype points may be added. However, because the distance values are recorded in step 5, only the distances corresponding to new prototype points need to be computed. |
Expanded
Method in D-dimensions
The previous example illustrates the method for a data set in two dimensions. We now present the general method for finding the k-nearest neighbors for training set in d-dimensions. The idea is to find the best projection axis that minimizes the number of distance computations. In other words, for a given test point, we want to find the axis with the smallest density of prototypes around the test point.
Figure 6. Example of how the selection of the axis with the smallest density can reduce the number of distance computations. (a) All the prototypes must be verified. (b) Only 4 prototypes need to be verified.
Here are the instructions to do this:
Preprocessing
| Step 0.1 | Project all the points on all coordinate axes and sort these projection points. |
| Step 0.2 | Evaluate the expected number of distance computations ( n ) for a uniform distribution (worst case). The following table gives the expected number of distance computation ( n ) for 3 types of Minkowski metrics: |
| Metric | p | n |
| Manhattan | 1 | |
| Euclidean | 2 | ![]() |
| Maximum coordinate distance | ¥ |
k = number of nearest neighbors, d = number of dimensions, N = number of prototypes
Finding Nearest Neighbor(s)
| Step 1 | Locate the test point's projection on each axis. Dichotomic search, complexity O(log N). |
| Step 2 | Find the position of the (n/2)th prototype on each side of the test point (see Figure 7). |
| Step 3 | Measure the distance between these two prototypes (S). See example (Figure 7) below for n = 10. |
| Step 4 | Compute
the local projected density (D) in the
neighborhood of the test point. D = n ¸ S |
| Step 5 | Find the coordinate axis with the smallest local projected density (D). Use this coordinate axis for the rest of the search. |
Figure 7. Illustration of the local projected density (step 0.1 to 3) for n = 10.
The following steps are the same as the
first example:
| Step 6 | Find the two closest projected points (one on each side) from the projected test point. |
| Step 7 | Measure the distance (in full dimensionality) between the two prototypes found in step 6 and the test point and identify which one is the shortest (rd). |
| Step 8 | Determine
the limits of the search:
|
| Step 9 | Compute and save in memory the distances between the test point and the prototypes located between the two boundaries. |
| Step 10 | Look for the shortest distance within this set. The prototype point with the shortest distance corresponds to the nearest neighbor. |
| Step 11 | If k > 1, "erase" the nearest neighbor found in step 10 and repeat k times the steps 6 through 11. |
| Note: | If k > 1, the two boundaries are reevaluated for each iteration. As a result, new prototype points may be added. However, because the distance values are recorded in step 9, only the distances corresponding to new prototype points need to be computed. |
From simulation experiments with a
training set of 1000 points, it was possible to quantify the
improvement made with this method (n = number of
distance computations):
| k | d | n (projection) / n (brute force) | n (projection) / n (brute force) (%) |
| 1 | 2 | 10/1000 | 1 % |
| 1 | 8 | 100/1000 | 10 % |
| 10 | 2 | 400/1000 | 40 % |
A uniform distribution represents a worst case
for the performance of the algorithm e.g. better results are
obtained with a normal (Gaussian) distribution.
The number of test points ( Nt) required for the preprocessing procedure to be
profitable (step 0.1 and 0.2 in the expanded method) can be
evaluated with the following equation
| Np is the number of prototypes. |
In the computation of the distances, the
Minkowski metric for p = ¥ (maximum
coordinate distance) should be considered first because it also
minimizes the number of distance calculations:
| [1] | Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification, John Wiley Interscience, 1999. |
| [2] | J. H. Friedman, F. Baskett and L. J. Shustek. An Algorithm for Finding Nearest Neighbors. IEEE Trans. Comput.., vol. C-24, pp. 1000-1006, Oct. 1975. |
Here is a link to an other method:
Fast Nearest Neighbor Search in High-dimensional Space