Applet 

Further Applications
Feature selection

As we saw earlier, the purpose of genetic algorithms is optimization. We have already seen how they can be applied for finding decision boundaries in classification. In the context of pattern recognition, we are also confronted with the problem of finding the best subset of features for successful classification, a problem that is known as feature selection. Feature selection is especially useful when dealing with high dimensional input patterns with, say, hundreds or possibly thousands of features, making the number of distinct subsets very large (2^n).

But why do we want to limit ourselves to a subset of features for classification? Well, clearly, considering every single feature of an input pattern when the number of features is potentially quite large makes classification very expensive computationally. Moreover, certain features might be noisy and not have the same accuracy as others. We are then interested in assigning weights to features according to their relevance, where a weight of 0 implies that the given feature is ignored. Having weights lets us have a notion of relative importance and gives us better control. Why don't we simply ignore noisy measurements? It can be shown that actually combining them asymmetrically with more accurate ones gives us more information about the input pattern (see Kalman filter).

Now that we understand why feature selection is critical when dealing with high dimension feature spaces, let's see how GAs can be applied to this particular problem. The method we outline here is entirely generic and is therefore independent of the classifying technique or discriminant function used (eg. Knn).

Ga/Knn method:

We choose to combine the K-nearest neighbor decision rule for classifying input patterns with a genetic algorithm to select features. Suppose we have 100 measurements or features for each input pattern. We construct a chromosome of length 100 where each gene corresponds to a single bit. If bit i is 1, we take the corresponding feature into account in the Knn classification, otherwise we simply ignore it (see figure 1).

As mentioned earlier, we can extend this scheme in order to have relative weighting. This can be achieved easily by replacing each bit by an integer range, from 0 to 10 say, therefore improving the granularity of the method. This turns out to yield better results as expected. Higher weights reflect sensitivity whereas lower weights tending to 0 show less importance). Relative weights allow us to warp the feature space in order to provide optimal performance in classification.

GaKnn figure 1

figure 1:

The fitness function is based on the performance of the chromosome in terms of classification. Evaluation is straightforward, we just run the k-nn method on the input data given the subset or "weighted" subset of features defined by the chromosome and assign the success rate in classifying known data sets to the given organism or chromosome. As in most GAs, the evaluation of the fitness function is the most time-consuming and therefore the most critical in terms of performance. Having an incremental approach to evaluation instead of just using brute force (evaluating each chromosome separately from scratch) would certainly improve the running time.

The naive approach for evaluating performance is to use the following function:

basic fitness function

This effectively computes the fraction of correctly classified patterns. However, although this function yields reasonably good results, it turns out that it doesn't give the optimal weight setting for maximum class separation. It actually gives a setting dependent on the value of k (the number of nearest neighbors). The following formula achieves better results:

basic fitness function

where gamma and delta are arbitrary constants based on experiments and nmin is the number of neighbors that were not used in the subsequent classification.

The following figure summarizes the approach taken:

GaKnn figure 2

figure 2:

Of course, they are many ways one can use GAs, and the previous method is simply presented for illustrative purposes. One should really think of a genetic algorithm as a mean or tool for solving optimization problems and not as an end in itself. When used properly, GAs are able to approximate exhaustive search without having the exponential blowup.

For applications of GAs in pattern recognition and in other fields, see:
GARAGe : Michigan State University GAs Research and Applications Group.
Avida : Caltech group researching auto-adaptive genetic systems.
Genitor : GA Research group at Colorado State University.

Conclusion