Editing Nearest Neighbour Decision Rules.

(Applications of Proximity Graphs.)

Pattern Recognition, 308-644 Term Project, Sergei Savchenko.

1. Introduction.

Decision rules are employed in many areas such as pattern recognition and computer vision. They are used to determine class membership for an object based on some numerical measurements computed for that object.

Parametric decision rules make the membership classification based on the knowledge of the a priori probabilities of occurrence of objects belonging to some class $C$ _i and the underlining distributions of the probability density functions: $p(X|C$ _i) which characterise how probable is the measurement $X$ should the membership class be $C$ _i.

In many situations, however, these distributions are unknown or are difficult to describe and handle analytically.

Non-parametric decision rules, such as the nearest neighbour rule, are attractive because no prior knowledge of the distributions is required. These rules rely instead on the training set of objects with known class membership to make decisions on the membership of unknown objects.

The nearest neighbour rule, as its name suggests, classifies an unknown object to the class of its nearest neighbour in the measurement space using, most commonly, Euclidean metrics (see Figure 1.0).

Figure 1.0: The nearest neighbour rule.

Beside being extremely simple to implement in practice, the probability of error for the nearest neighbour rule given enough members in the training set is sufficiently close to the Bayes (optimal) probability of error. It has been shown by Cover and Hart [2] that as the size of the training set goes to infinity, the asymptotic nearest neighbour error is never two times worse than the Bayes (optimal) error. More precisely:

P

_e^(nn-rule) < P_e(2-P_eN/(N-1))

Where $P$ _e is Bayes error and $N$ is the size of the training set. Furthermore, the performance of the obvious modification for this rule, namely the k-nearest neighbour decision rule, can be even better. The latter classifies an unknown object to the class most heavily represented among its $k$ nearest neighbours (see Figure 1.1).

Figure 1.1: The k-nearest neighbour rule.

Despite simplicity and good performance, the traditional criticism of this decision rule pointed at large space requirement to store the entire training set and the seeming necessity to query the entire training set in order to make a single membership classification. As a result, there has been considerable interest in editing the training set in order to reduce its size.

Different proximity graphs (such as Delaunay triangulation) provide efficient geometric apparatus for solving this problem [1].

2. The Voronoi Diagram Approach.

Given a set of points (referred to as sites or nodes) a Voronoi diagram is a partition of space into regions, within which all points are closer to some particular node than to any other node (see Figure 2.0).

Figure 2.0: Voronoi diagram.

The Delaunay triangulation is a dual of the Voronoi diagram. If two Voronoi regions share a boundary, the nodes of these regions are connected with an edge (see Figure 2.1). Such nodes are called the Voronoi neighbours (or Delaunay neighbours).

Figure 2.1: Delaunay triangulation.

Considering the nearest neighbour rule, it can be seen that if we compute the Voronoi diagram using the training set as nodes, when some unknown point falls into a particular Voronoi region we must classify this point to the class of that region's node. By definition of the Voronoi region, its every point is closer to the region's node than to any other node, and thus this node must be the nearest neighbour of any new point inside the region.

The decision boundary of a decision rule separates all those points of space in which an unknown object will be classified into some particular class from the remaining points where we classify into some other class. Obviously enough, the decision boundary of the nearest neighbour rule belongs to the Voronoi diagram. In other words, the boundaries of the Voronoi regions separating those regions whose nodes are of different class contribute to a portion of the decision boundary (see Figure 2.2).

Figure 2.2: The decision boundary.

Although the decision boundary is rarely explicitly computed on practice for high dimension problems, it is clear that, having the decision boundary alone is sufficient to perform the classification of new objects. Hence the nodes of the Voronoi regions whose boundaries did not contribute to the decision boundary are redundant and can be safely deleted from the training set (see Figure 2.3).

Figure 2.3: The edited training set.

As it can be seen from Figure 2.3, the neighbours of the deleted regions expand, in some sense, to occupy the resulting void.

The above consideration validates the following algorithm for editing the nearest neighbour decision rule:

Compute the Delaunay triangulation for the training set.
Visit each node, marking it if all its Delaunay neighbours are of the same class as the current node.
Delete all marked nodes, exiting with the remaining ones as the edited training set.

The two great advantages of this algorithm is that it preserves the decision boundary and that it generates the smallest, boundary-preserving edited set assuming that the original set didn't have degeneracies.

It is not hard to see, that if there were degeneracies, particularly more than two collinear points, the resulting edited set may not be minimal (see Figure 2.4).

Figure 2.4: A degenerate case.

As Figure 2.4 illustrates, editing will leave all the points, however any two opposite points, one from each class, are enough to preserve the decision boundary.

The following applet demonstrates editing the nearest neighbour rule using the Voronoi diagram approach. Create a training set for a two class classifier by selecting nodes of classes A and B. Press "Find" to mark redundant nodes and "Delete" to produce the edited set.

The source code is available.

A drawback of Voronoi editing is that it still leaves too many points in the edited set by preserving many points which are well separated from other classes and which should intuitively be deleted (see Figure 2.5).

Figure 2.5: Keeping more points than necessary.

As Figure 2.5 illustrates points $A$ ₁ and $B$ ₁ are well separated. They are preserved by the Voronoi editing to maintain a portion of the decision boundary. However, assuming that new points will come from the same distribution as the training set, the portions of the decision boundary remote from the concentration of training set points is of lesser importance.

Furthermore, computing the Voronoi diagram in high dimensions is quite costly. Any such algorithm will take in the worst case at least $theta(n$ ^[d/2]) [3] where $n$ is the number of points and $d$ is the number of dimensions.

The cost of the method and practical necessity to obtain smaller edited sets, lead us to consider some approximate solutions which would be attractive for practical applications.

3. The Gabriel Graph Approach.

Two points A and B are said to be Gabriel neighbours if their diametral sphere (i.e. the sphere such that AB is its diameter) doesn't contain any other points (see Figure 3.0).

Figure 3.0: Gabriel neighbours.

As Figure 3.0 demonstrates, the points A and B are Gabriel neighbours, whereas B and C are not.

A graph where all pairs of Gabriel neighbours are connected with an edge is called the Gabriel graph.

The Gabriel editing algorithm can be formulated in a way similar to Voronoi editing:

Compute the Gabriel graph for the training set.
Visit each node, marking it if all its Gabriel neighbours are of the same class as the current node.
Delete all marked nodes, exiting with the remaining ones as the edited training set.

The Gabriel edited set is always a subset of the Voronoi edited set because of the fact that a Gabriel graph of a set of points is a subgraph of Delaunay triangulation for that set. Thus, Gabriel editing reduces the size of the training set even more than Voronoi editing.

Although, the resulting Gabriel edited set doesn't preserve the original decision boundary, the changes occur mainly outside of the zones of interest (see Figure 3.1).

Figure 3.1: Gabriel editing.

Gabriel graph can be computed by discarding edges from Delaunay triangulation. However, the complexity of computing the latter may be as bad as $O(n$ ^[d/2]) . Thus, this approach is not a very attractive one when number of dimensions is large.

Clearly, the Gabriel graph can be computed by brute force if for every potential pair of neighbours $A$ and $B$ we just verify if any other point $X$ is contained in the diametral sphere:

distance

²(A,X) + distance²(B,X) < distance²(A,B)

Since there are $n$ ² potential neighbours and we need to verify every pair by checking $n$ other points by using the distance computation which requires $O(d)$ steps, the resulting complexity of the brute force algorithm is $O(dn$ ³).

The brute force algorithm can be easily improved by the following heuristic. For a point $A$ we create a list of points which are its potential Gabriel neighbours. When checking if the point $B$ is indeed the Gabriel neighbour we must perform the test for every point C if it is contained in the diametral sphere (see Figure 3.2).

Figure 3.2: Discarding points which can't be Gabriel neighbours.

As Figure 3.2 describes, when testing if the point

C

is inside the diametral sphere, we can also test if it is in the right half-space with respect to the plane p which is orthogonal to the diameter

AB

. If it is, the point C cannot be a potential Gabriel neighbour of A since the point B will be contained in the sphere with the diameter

AC

. Thus we discard it from the list of potential neighbours and avoid a prolonged computation that would have been necessary otherwise.

This heuristic reduces the average complexity to $O(dn$ ²).

The following applet demonstrates editing the nearest neighbour rule using Gabriel graph approach. Create a training set for a two class classifier by selecting nodes of classes A and B. Press "Find" to mark redundant nodes and "Delete" to produce the edited set.

The source code is available.

4. The Relative Neighbour Graph Approach.

Using the same methodology as with the Gabriel graph, it may be advantageous to restrict the definition of a neighbour even further in order to obtain even smaller edited sets. The relative neighbour graph, first explored in [4], gives such a restriction.

Two points $A$ and $B$ are called relative neighbours, if for all other points $X$ in the set, the following is true:

distance(A,B) < max{distance(A,X),distance(B,X)}

Geometrically, the above implies that no point X is contained in the "lune" constructed on points A and B should the latter be relative neighbours (see Figure 4.0).

Figure 4.0: Relative neighbours.

As Figure 4.0 demonstrates, the points B and C are relative neighbours whereas A and B are not.

A relative neighbour graph is such a graph where all pairs of relative neighbours are connected by an edge.

Thus, relative neighbour editing algorithm can be expressed similar to Voronoi or Gabriel editing by substituting the relative neighbour graph for the underlining graphs used in those algorithms.

Since the relative neighbour graph is a subgraph of the Gabriel graph, it reduces the size of edited sets even further. However, the changes to the decision boundary can be quite severe, both in the important and less important zones (see Figure 4.1).

Figure 4.1: Relative neighbour editing.

The algorithm for practical computation of the relative neighbour graph is analogical to that used to obtain the Gabriel graph.

The following applet demonstrates editing the nearest neighbour rule using relative neighbour graph approach. Create a training set for a two class classifier by selecting nodes of classes A and B. Press "Find" to mark redundant nodes and "Delete" to produce the edited set.

The source code is available.

5. References.

[1] Binay K. Bhattacharya, Ronald S. Poulsen, Godfried T. Toussaint, "Application of Proximity Graphs to Editing Nearest Neighbor Decision Rule", International Symposium on Information Theory, Santa Monica, 1981.
[2] T.M. Cover, P.E. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory, vol. IT-13, No.1, 1967, pp.21-27
[3] V. Klee, On the complexity of d-dimensional Voronoi diagrams", Arch. Math., vol. 34, 1980, pp. 75-80.
[4] Godfried T. Toussaint, "The Relative Neighborhood Graph of a Finite Planar Set", Pattern Recognition, vol.12, No.4, 1980, pp.261-268.

Last updated 4/26/98. Comments about the page? Download this page.