Introduction

Genetic algorithms are an effective method for the extraction of geometric features in image processing. However a well-known alternative approach to geometric primitive extraction is the Hough Transform. Nevertheless, as we will see, the Hough transform is very time and space consuming, especially when the dimension of the search space is high. In fact, this implies that the Hough transform cannot efficiently recognize complex shapes with many degrees of freedom. Therefore many interesting studies have been made to deal with that problem.

Hough Transform Survey

The Hough Transform is a method that has been first developed to detect simple analytical defined curves, like lines or circles. Each of these curves can be defined by parameters. For instance, we just need a point and a positive radius to define a circle. The principle of the Hough Transform is to perform an approximate but exhaustive search of the parameters space. Its dimension is equal to the number of degrees of freedom of the primitive. For instance, the dimension of the parameter space for a circle is 3, while it is 5 for an arbitrarily oriented rectangle. The parameter space is discretized into cells called accumulators, where each accumulator cell represents a possible instance of the geometric shape. Let's describe a basic Hough Transform Algorithm :

Preprocess the image pixels to enhance the resulting feature points (for instance, use edge detection)

For each feature point

For each accumulator cell

If the feature point belongs to the geometric primitive that corresponds to the accumulator cell, increment the value of the accumulator.

Choose those accumulators having the greatest values, and hence accounting for most of the feature points.

The first preprocessing step of this method, is common to all primitive extraction methods (for more information look here). The complexity of the rest of the method is dependent on the number d of parameters which define the geometric primitive as well as the N values into which each parameter is discretized. Typically the time and storage space that is required is:

We can therefore see that the basic Hough Transform is a template matching method (For a deeper survey, read [4].). Indeed, the Hough Transform can be seen as a systematic test of how well each possible instance of the template fits the image. But another formulation on the same problem is: find the parameters that define an instance of the template which best matches the feature points of the image. This matching may be easily expressed as a fitness function. The use of genetic algorithms is consequently meaningful because they are good approximate methods to solve these problems in high dimensional space. Typically each iteration of a genetic algorithm, having a population of size S, takes O(S) space and between O(S) to O(S^2) time, depending on the implementation. By providing an approximate solution, genetic algorithms make it easier to control the required computational cost without sacrificing the accuracy of the parameters, by adjusting the population size and the halting criterion. The Hough Transform, which relies on an exhaustive search strategy of the parameter space, is in practice much more expensive to compute, and the main way to improve its performance is to decrease the accuracy of the discretization. The Hough Transform guarantees an optimal solution for a given discretization of the parameter space, while genetic algorithms often converge to good locally optimal solutions. The former approach is deterministic, while the latter is probabilistic.

The Hough Transform is a method that has attracted a great deal of attention, and many improvement have been made, like the generalized Hough Transform, the probabilistic Hough Transform and the hierarchical Hough Transform. Therefore the Hough Transform remains a very powerful technique. For instance, the Probabilistic Hough Transform decreases the number of feature points that need to be examined. Tricks have been developed to reduce the dimension of the parameter space that needs to be searched at a given time. But the point is that it is still very time and space consuming. Genetic algorithms have proved themselves as an effective alternative which addresses these problems. They generally converge quicker to reasonable solutions than other evolutionary approaches such as simulated annealing

Genetic Algorithms Survey

We will now survey some of the published results [1], [3] and [5] regarding geometric primitive extraction with genetic algorithms. Indeed, genetic algorithms are very diversified and we will examine which approaches work better than others. We will particularly focus on the recognition of rectangles, as this is what our own program does.

Chromosome Representation

The strength of a genetic algorithm depends on a robust encoding of the primitive as a chromosome. One may choose between two representations :

Two opposing corners and a rotation angle (5 parameters)

A center point, width, height and a rotation angle (5 parameters)

Both representations ensure that each instance of the geometric primitive has a unique description. The second representation seems better, because it is more stable. Moreover, a rectangle detection algorithm should not detect rectangles that resemble lines. Similarly, it should be robust to random noise, by not detecting rectangles which have only a tiny area. The second representation may address these concerns directly. The width and height components of the search space can be reduced and the operators modified accordingly.

Another approach is to detect only lines and then reassemble them into rectangles. However this appears to be noise-sensitive and, perhaps, not so simple.

Genetic Algorithm Operators

First of all, we must define the encoding of the chromosomes : binary string or a real valued vector. In the surveyed papers, binary representation was mainly used. However this representation has its disadvantages. The effects of mutation on a binary string are difficult to predict; if the binary string represents an integer, such as a coordinate of a rectangle, a change in the most significant bit is much more drastic than a change in the least significant bit. Bits should flipped with some care not to introduce unnecessary

In case of the cross-over operator, the vector-based representation has also its advantages. The effects of blending two numbers are far more predictable than the effects of interchanging parts of their binary representations. Either approach may limit itself to simply exchanging the genes, which represent the different parameters, between chromosomes but this essentially fails introduce any new information into the gene pool.

To characterize these operators, we must also give their probability of occurrence. For the cross-over operator, the articles suggest a constant probability around 70% and this appears to be the standard way to proceed.

For the mutation operator, the easiest way is to use a similar approach and set the mutation rate to around 5%. However studies have shown that starting at a high mutation probability and then slowly decreasing this probability with each generation should ensure convergence, at least in theory. While the theoretical rate of decrease is too slow in practice, the idea remains interesting. For the solution to converge, it helps if the rate of mutation is progressively reduced. Making the mutation probability an exponentially function of time seems a good way to proceed, as chosen in paper [3].

Fitness/Cost Function

The cost function is highly dependent on the input. In most algorithms, geometric primitives are extracted once the edges have been detected. It should be noted that robust edge extraction can be quite a difficult task in itself. With edge detection as preprocessing, the cost function is a measure of the coincidence between the edges of the geometric primitive and the edges detected in the image. To smooth the cost function one needs to thicken and blur the outlines of the detected edges. It is important to distinguish pixels that lie near an edge from those that are far away. One way is therefore to perform a distance map as proposed in [3].

In paper [5], they have chosen to work directly with the image of the shape. To use this approach, they need a black and white image as the input for the algorithm. Here, typically thresholding needs to be applied to a color or greyscale image. The cost function is simply the number of black pixels that a given rectangle contains.

Detecting Many Primitives At the Same Time

In pattern recognition applications, one might not only wish to identify the single primitive which best matches the image, but to recognize all the primitives that are present. The method described in paper [5] tries to approximate shape with a set of rectangles, where each set is a chromosome. However this method needs to know in advance the number of rectangles to search for. When this set is large, it becomes computationally expensive to adequately explore the entire solution space. It could be interesting to allow the genetic algorithm to vary the number of rectangles in the chromosomes.

In the standard method (paper [1] and [3]), each chromosome represents a single rectangle. Through the course of evolution the population converges towards an optimum value. Consequently, the final population is relevant to a single instance of the primitive. A naive approach is to find the best primitive, erase it from the image, and iterate the algorithm on the new image. But this seems to be a very unstable method which wastes a lot of computational effort.

Paper [3] presents an interesting way to deal with this issue, called sharing. Sharing is based on the following idea : the fitness function has some local maxima, which create sub-populations of chromosomes that try to approximate them. It is particularly interesting in our case because the local maxima of the fitness function correspond to different legitimate instances of the geometric primitives. Hence we should seek to keep variety in our population. This can be achieved by preventing all the chromosomes from coalescing around a single optimal value. So the fitness function is modified to penalize the less fit chromosomes which belong to a large sub-population. A sub-population is expressed as a neighborhood of the parameter space. Of course, the approach increases the computational load and convergence time. But it allows a single run of the algorithm to extract many geometric primitives.

To conclude our survey of the different techniques, we feel that the best ingredients for detecting rectangles in an image using a genetic algorithm are:

Preprocessing Step: convert the scene into a black and white image using locally adaptive thresholding as well as noise reduction.

Chromosome Representation: a real valued vector representing the center point, width, height and rotation angle of each rectange.

Fitness/Cost Function: the portion of the areas of the rectangle that contain black pixels in the image.

Genetic Algorithm Operators: mutation with an occurrence probability decreasing exponentially from about 25% to less then 1%. Constant cross-over probability of about 70%.

Detecting Multiple Primitives: sharing seems a very interesting technique.

Stopping Criteria: When the fittest chromosomes of the population stop changing from one generation to another.

Applications of Geometric Primitives Extraction

Identifying the geometric primitives that comprise an image is analogous to recognizing words from a collection of letters. It transforms lexical data into semantic information. It provides the simple building blocks for the comprehension of a scene. After all, an algorithm is no more able to reason about an image in terms of individual pixels than a human being can see an objects as a mass of colored patches.

Here are some applications of geometric primitives extraction:

Robot Navigation: As a mobile robot equipped with visual sensors navigates through its environment, it must be able to make sense out of its surroundings in order to accomplish its task. In a man-made environment, most of the objects it may encounter have roughly oval or quadrilateral outlines. The robot will typically use geometric primitive detection to create a virtual map of its surroundings.

Stereo Vision: Just as the human brain uses two eyes to perceive depth, so the computer can be given several 2D views of a 3D scene and then asked to reconstruct the geometry of the 3D scene. Geometric primitive extraction serves as a preprocessing step, since it is much easier to model the complex relations of perspective and occlusion of simple shapes than of pixels.

Motion Perception: The computer is provided with a movie, a series of snapshots taken at regular time intervals, and it is then asked to determine what is moving and where it is going. Here too geometric primitive extraction may be an important preprocessing step needed to simplify the complexity of the task.

Automated Visual Inspection: In industry, manufactured products often need to be visually inspected for defects before being shipped to the customer. This inspection requires that the item be compared with a standard template. Geometric primitives provide a very effective language for expressing these templates.

Document Analysis: Advanced optical character recognition software may need to understand scanned newspapers or bureaucratic forms. It must be able to segment the page according to how it is arranged. This requires rectangle detection.

Head Tracking: Real-time head tracking allows a desktop computer, equipped with a small camera, to tell if its user is paying attention to the screen. Many head tracking systems are based on the idea that the human head is practically an ellipse.