The Algorithm by Tenmoto et al. [1]

This page describes the algorithm developed by Tenmoto et al. [1] to construct a piecewise linear classifier. If you are not sure what a classifier is, you should first read the classification overview. This page consists of the following:

Preliminaries

Points and Prototypes

Before the algorithm can proceed, a set of prototypes must be generated to represent the set of data points. Prototypes can be thought of as 'super points', they are points that represent a group of actual data points.  Figure 6 shows five prototypes from three different classes, along with the groups of data points they represent.

Figure 6 - Points and prototypes
A prototype represents a group of points of a single class in a region of point space. One prototype represents more that one or more points, but each point belongs to a single prototype. The location of a prototype is determined by some measure of centrality of the set of points that it represents, such as the center of mass. Prototypes can be generated through clustering algorithms, such as the K-means algorithm or minimal spanning trees, although the generation of prototypes is not covered in the algorithm presented.

The Gabriel Graph

After a set of prototypes has been generated, their Gabriel graph is constructed. In the Gabriel graph of a point set, edges exists between all pairs of points such that the hypersphere who's diameter is the edge between two points contains no other points. The Gabriel graph of a set of prototypes is shown in figure 7.

Figure 7 - Gabriel graph

Tomek Links

From the Gabriel graph, a subset of edges called Tomek links is generated. A Tomek link is an edge in the Gabriel graph that connects two prototypes of different classes. The set of Tomek links of set of prototypes is shown in figure 8.

Figure 8 - Tomek links

Construction of Hyperplanes

Once prototypes and Tomek links have been generated for a set of points, the algorithm to construct a minimal set of hyperplanes begins. To illustrate the algorithm, a simple example is presented using the data set, prototypes and Tomek links from figure 8.

The basic procedure is to find a set of hyperplanes that 'cut' Tomek links, thus separating prototypes and points belonging to different classes. Figure 9 shows a hyperplane (in purple) cutting a Tomek link. It is shown later that a set of hyperplanes that cut all Tomek links divides the point space into distinct regions, which can be used to classify points in a piecewise linear fashion. The key concept is to cut as many Tomek links as possible with each hyperplane, given the constraint that they must be able to classify points with an error less that a predefined maximum acceptable classification error. This results in a smaller set of hyperplanes, reducing the complexity of the classification according to a classification error limit.

Figure 9 - Cutting a Tomek link. Local classification error rate of 0%
First, a hyperplane cutting a Tomek link is created and locally 'trained', i.e. positioned such that its local classification error (i.e. classification of points belonging to prototypes on either end of the Tomek link) is minimal. Figure 9 how this is done via illustration, a more indepth description can be seen here. If the hyperplane is incapable of locally classifying points with an error rate less than maximum acceptable error threshold, it becomes part of the set of hyperplanes, and a new hyperplane is created on a new Tomek link. Once a Tomek link has been 'cut' by a hyperplane, it is removed from the set of Tomek links to be considered in the future.

If the hyperplane is capable of locally classifying points within the maximum error threshold, it is then modified to cut other nearby Tomek links. Tomek links are considered 'nearby' based on the distance of their midpoints to the midpoints of Tomek links already cut by the hyperplane under inspection. The distance to the nearest Tomek link is shown in figure 10 by the grey line connecting the midpoints of two Tomek links.

Figure 10 - Nearby Tomek links
As in the case with a hyperplane cutting a single Tomek link, the hyperplanes cutting multiple Tomek links are locally trained to classify points on either side with a minimal error rate. As long as the classification error rate of the hyperplane is below the maximum acceptable error threshold, the hyperplane is continually adjusted to cut new Tomek links. In figure 11, two nearby Tomek links are cut by a single hyperplane, which locally classifies points with an error rate of 0%.

Figure 11 - Cutting 2 nearby Tomek links. Local classification error rate of 0%
When cutting multiple Tomek links with a hyperplane, a new constraint is added: all prototypes on one side of the hyperplane must belong to the same class. This is called the class limitation. This constraint states that hyperplanes are to define the boundary between a particular class and any other classes. Figure 12 demonstrates this constraint, where cutting the third Tomek link is acceptable since all prototypes below the hyperplane belong to a single class.
Figure 12 - Cutting 3 nearby Tomek links. All prototypes on one side of hyperplane must belong to a single class
When no more nearby Tomek links can be cut by the hyperplane under inspection, either because the classification error rate becomes unacceptable or because of the class limitation, it is added to the set of classification hyperplanes, and a new hyperplane is considered. Figure 13 shows a 2nd hyperplane added. In this instance the location classification error rate is greater than 0% due to overlap of points from different classes, but it is lower than the maximum acceptable error threshold.

Figure 13 - Adding a 2nd  hyperplane. Local classification error rate > 0%, but acceptable
In figure 14, attempting to cut two links with the 2nd hyperplane results in a local classification error greater than the maximum acceptable error rate, due to point overlap of points from different classes. Cutting this Tomek link is therefore not accepted.

Figure 14 - Cutting 2 Tomek links with new hyperplane. Local classification error rate > max acceptable error rate
Another attempt is made to cut a different nearby Tomek link in figure 15. The error rate is greater than 0%, but below the maximum acceptable error rate, and therefore accepted. At this point, there are no other Tomek links remaining to be cut by the hyperplane currently under inspection, and the hyperplane is added to the set of classification hyperplanes.

Figure 15 - Cutting 2 Tomek links with new hyperplane. Local classification error rate > 0%, but acceptable
A 3rd hyperplane is then created to cut the last remaining Tomek link in Figure 16. The classification error rate is greater than the maximum acceptable threshold due to overlap of points from different classes, but the hyperplane is added to the set of classification hyperplanes because there exists no better hyperplane to cut the Tomek link. In total, three hyperplanes are created for this example.

Figure 16 - Cutting Tomek link with 3rd hyperplane. Local classification error rate > max acceptable error rate, but accepted

From Hyperplanes to Regions

Once the set of classification hyperplanes has been created, they can be used to define regions in point space corresponding to unique classes, which are used to classify points. Tenmoto et al. do not discuss how this is done, but a simple demonstration is provided here to complete the algorithm description.

For each prototype, there exits a subset of hyperplanes, which cut its associated Tomek links. Each hyperplane in this subset divides the point space into two distinct regions, one of which contains the prototype. The union of all these regions containing the prototype defines a region in point space, which isolates the prototype from all other prototypes to which it is connected via Tomek links. In figure 17, the region formed by this union is shown for the red prototype in the top left corner of the example.

Figure 17 - Union of regions separating a prototype from all other prototypes connected via Tomek links
The union of all regions isolating all prototypes of a particular class from their Tomek neighbors defines a region (possibly disjoint) in point space containing all prototypes the particular class, and no prototypes of any other class. This union represents the entire subspace of the point space in which points are classified as belonging to a particular class. In figure 18, this region is shown for the red class from the example. All points within the red regions are classified as being from the red class.

Figure 18 - Regions in which points are classified as being from the red class.
The set of these regions for each class uniquely classifies all points in point space. Figure 19 shows the final set of classification regions for all three classes.

Figure 19 - Regions in point space uniquely classifying all points

Discussion

The algorithm presented for piecewise linear classification is useful in that class boundaries can be constructed for a set of test data with a desired classification error tolerance. The authors go so far as to present a method for determining the maximum acceptable error via a minimum description length calculation, although this was not discussed in this report.

There are several subtle problems that reduce the algorithm's usefulness however. Achieving classification under the maximum acceptable error threshold is not guaranteed by the algorithm itself, since the first Tomek link cut by a hyperplane in an iteration of the algorithm is guaranteed to be accepted regardless of its classification error rate on associated data points. This may or may not be a significant problem though, because if a hyperplane cannot cut a single Tomek link with a classification error rate less than the maximum error threshold, the error threshold is not achievable at any rate.

Secondly, there is no guidance as to how many prototypes to use on a particular set of test data. This is an important question - in the limits, one could either use a single prototype for each class, or a prototype corresponding to each data point. Either approach would have a very different effect on the minimum error achievable and the generalizability of the class boundaries.



This web page prepared by
Matt Toews (mtoews@cim.mcgill.ca)
as a term project for the course
Computer Science 644 - Pattern Recognition
at the
McGill Center for Intelligent Machines