# Classification Overview

Classification deals with associating a class label to a point, according to the class to which the point is believed to belong. Classification is typically based on a distance measure between the point under consideration and other points belonging to known classes in point space. Figure 1 shows and example of the classification task in two dimensional point space, where the task is to label a point as belonging to either class A or class B.

Class A
Class B

## Linear Classification

Linear classification deals with classifying points according to their position relative to a hyperplane in the point space. All points on a given side of the hyperplane are considered as belonging to a class. Figure 2 shows an example of a linear classification in two dimensional point space, where the hyperplane is a line. All points to the left of the hyperplane belong to class A, and all points to the right belong to class B. Any new point is assigned a class label according which side of the line it falls. Linear classification is considered a simple and robust method of classification in the absence of a known model for class data.

Class A
Class B

## Piecewise Linear Classification

It is not hard to think of configurations of points for which simple linear classification will perform poorly, however. There may exist no hyperplane capable of uniquely separating point classes, because of geometrical reasons or simply because there are more than 2 classes. Figure 3 shows an example of this problem.

Class A
Class B

##### Figure 3 - Linear classification problem
Piecewise linear classification solves this problem by defining boundaries between classes using several hyperplanes. Using multiple hyperplanes, it is possible to define more than 2 regions in space, making it possible to classify a set of points more precisely. Figure 4 shows how piecewise linear classification works.

Class A
Class B

## Classification Error

Classification is not perfect; points may be misclassified due to factors such as data noise. Classification error can be determined by attempting to classify a set of test points, and counting the number of misclassifications. The classification error rate can be estimated by the number of misclassifications divided by the total number of points in the test set. In Figure 5, a point from class A is misclassified as belonging to class B. Since there are 11 points in the test set, and 1 has been misclassified, the classification error rate is 1/11 = 0.09, or 9%.

Class A
Class B

## Classification Error vs. Generalizability

Although useful, piecewise linear classification must be used judiciously. In the limit, enough hyperplanes can be defined to precisely classify each point in the set of training data. But by the principle of Ocam's Razor, an overly precise classification scheme tailored to a set of training data tends to generalize poorly when it comes to classifying new data. Thus, a compromise must be made between a classifier that accurately describes its test data (i.e. using more hyperplanes) and a classifier that will likely perform well on new data (i.e. using fewer hyperplanes).

This web page prepared by
Matt Toews (mtoews@cim.mcgill.ca)
as a term project for the course
Computer Science 644 - Pattern Recognition
at the
McGill Center for Intelligent Machines