Title 

Introduction
Genetic algorithms

The basic purpose of genetic algorithms (GAs) is optimization. Since optimization problems arise frequently, this makes GAs quite useful for a great variety of tasks. As in all optimization problem, we are faced with the problem of maximizing/minimizing an objective function f(x) over a given space X of arbitrary dimension. A brute force which would consist in examining every possible x in X in order to determine the element for which f is optimal is clearly infeasible. GAs give a heuristic way of searching the input space for optimal x that approximates brute force without enumerating all the elements and therefore bypasses performance issues specific to exhaustive search.

Underlying idea:

We will first select a certain number of inputs, say, x1,x2 ... xn belonging to the input space X. In the GA terminology, each input is called an organism or chromosome. The set of chromosomes is designated as a colony or population. Computation is done over epochs. In each epoch the colony will grow and evolve according to specific rules reminiscent of biological evolution.

To each chromosome xi, we assign a fitness value which is nothing but f(xi). Stronger individuals, that is those chromosomes with fitness values closer to the colony optimal will have greater chance to survive across epochs and to reproduce than weaker individuals which will tend to perish. In other words, the algorithm will tend to keep inputs that are close to the optimal in the set of inputs being considered (the colony) and discard those that under-perform the rest.

The crucial step in the algorithm is reproduction or breeding that occurs once per epoch. The content of the two chromosomes participating in reproduction are literally merged together to form a new chromosome that we call (guess what?) a child. This heuristic allows us to possibly combine the best of both individuals to yield a better one (evolution).

Moreover during each epoch, a given fraction of the organisms is allowed to mutate (yes, we also have parthenogenesis). This provides a degree of randomness which allows us to span the whole input space by generating individuals with partly random genes.

As mentioned earlier, each epoch ends with the deaths of inapt organisms. We eliminate inputs exhibiting bad performance compared to the overall group. This is based on the assumption that they're less inclined to give birth to strong individuals since they have poor quality genes and that therefore we can safely disregard them (selection).


The algorithm:

Now that we've outlined the basic principles, let's examine in further detail how this whole process is accomplished and how the algorithm works in practise. Let's take the example of optimizing a function f over a space X contained in Nd.

Every input x in X is an integer vector x=(x1,x2,...,xn). For the sake of simplicity, assume 0<=xi<=k for i=1...n. In order to implement our genetic algorithm for optimizing f, we first need to encode each input into a chromosome. We can do it by having log(k) bits per component and directly encoding the value xi (figure 1). Each bit will be termed gene. Of course, we may choose any other encoding based on our requirements and the problem at hand.

chromosome encoding...
figure 1:

At epoch 0, we generate (possibly randomly) an initial set of inputs in X. Then at each epoch i, we perform fitness evaluation, reproduction, mutation and selection. The algorithm stops when a specified criterion providing an estimate of convergence is reached.

Reproduction: At each epoch, we choose a set of chromosomes belonging to the population that will mate. We choose to call such individuals females. Each female chooses a random set of potential partners and mates with the fittest of the group (this is another way of achieving selection). Once two organisms have been chosen for crossover, we merge their genetic information in order to create a new organism. The split position is determined randomly.
reproduction diagram...
Mutation: A new organism is created by randomly modifying some of its genes. This can be done right after reproduction on the newly created child or as a separate process.
mutation diagram...
Death: Worst performers among the colony are given a high probability of dying at the end of each epoch. We may also consider eliminating old chromosomes (NB: The highest performer is immune from death from old age).
death diagram...

Remarks:

In order to have fine grained control over the computation, we have to adjust parameters such as colony size, rate of reproduction, rate of death... Obviously these must be set empirically in order to fine tune the performance of the GA. We can do even better by incorporating such parameters in each chromosome so that optimal values may be found by the GA itself. For example, instead of using a global value, we might have a rate of reproduction or crossover specific to each chromosome, determining its probability of mating.

A major problem with most optimization technique is hill climbing: The algorithm gets trapped into a local maximum and stops when the convergence criterion is reached. Mutation which introduces randomness in the method allows us somehow, to avoid or at least reduce this undesirable effect.

For further information on genetic algorithms and related issues, see:
NovaGenetica : collection of links and information on GA.

Classifiers