As we saw earlier, the purpose of genetic algorithms is optimization.
We have already seen how they can be applied for finding decision boundaries in classification.
In the context of pattern recognition, we are also confronted with the problem of finding
the best subset of features for successful classification, a problem that is known as feature
selection. Feature selection is especially useful when dealing with high dimensional input
patterns with, say, hundreds or possibly thousands of features, making the number of
distinct subsets very large (2^n).
But why do we want to limit ourselves to a subset of features for classification?
Well, clearly, considering every single feature of an input pattern when the number
of features is potentially quite large makes classification very expensive computationally.
Moreover, certain features might be noisy and not have the same accuracy as others.
We are then interested in assigning weights to features according to their relevance, where
a weight of 0 implies that the given feature is ignored. Having weights lets us have a
notion of relative importance and gives us better control. Why don't we simply ignore
noisy measurements? It can be shown that actually combining them asymmetrically with more
accurate ones gives us more information about the input pattern (see Kalman filter).
Now that we understand why feature selection is critical when dealing with high dimension
feature spaces, let's see how GAs can be applied to this particular problem. The method
we outline here is entirely generic and is therefore independent of the classifying technique
or discriminant function used (eg. Knn).
Ga/Knn method:
We choose to combine the K-nearest neighbor decision rule for classifying input
patterns with a genetic algorithm to select features. Suppose we have 100 measurements
or features for each input pattern. We construct a chromosome of length 100 where
each gene corresponds to a single bit. If bit i is 1, we take the corresponding
feature into account in the Knn classification, otherwise we simply ignore it (see figure 1).
As mentioned earlier, we can extend this scheme in order to have relative weighting.
This can be achieved easily by replacing each bit by an integer range, from 0 to 10 say,
therefore improving the granularity of the method. This turns out to yield better results
as expected. Higher weights reflect sensitivity whereas lower weights tending to 0
show less importance). Relative weights allow us to warp the feature space in order
to provide optimal performance in classification.
The fitness function is based on the performance of the chromosome in terms of
classification. Evaluation is straightforward, we just run the k-nn method on the input
data given the subset or "weighted" subset of features defined by the chromosome and
assign the success rate in classifying known data sets to the given organism or chromosome.
As in most GAs, the evaluation of the fitness function is the most time-consuming and
therefore the most critical in terms of performance. Having an incremental approach
to evaluation instead of just using brute force (evaluating each chromosome separately from
scratch) would certainly improve the running time.
The naive approach for evaluating performance is to use the following function:
The following figure summarizes the approach taken:
Of course, they are many ways one can use GAs, and the previous method is simply presented
for illustrative purposes. One should really think of a genetic algorithm as a mean or tool for
solving optimization problems and not as an end in itself. When used properly, GAs are able
to approximate exhaustive search without having the exponential blowup.
For applications of GAs in pattern recognition and in other fields, see:

figure 1:
This effectively computes the fraction of correctly classified patterns. However,
although this function yields reasonably good results, it turns out that it doesn't
give the optimal weight setting for maximum class separation. It actually gives
a setting dependent on the value of k (the number of nearest neighbors). The
following formula achieves better results:
where gamma and delta are arbitrary constants based on experiments and nmin
is the number of neighbors that were not used in the subsequent classification.

figure 2:
GARAGe
: Michigan State University GAs Research and Applications Group.
Avida
: Caltech group researching auto-adaptive genetic systems.
Genitor
: GA Research group at Colorado State University.