A project by Marc Lanctot and Hani Ezzadeen presented to
Godfried Toussaint for COMP 644: Pattern Recognition, Winter 2004.
Introduction
This project is based on the Fuzzy Hamming Distance concept which was
taken from
this paper [1]. Fuzzy Hamming Distance metrics for Character RecognitionThe Fuzzy Hamming Distance (FHD) also measures dissimilarity between bit strings of equal length. However, it is not simply the number of 1s in the result of an XOR operation. The FHD incorporates bit locality: 2 bit strings with a bit displaced by 1 position should not be as distant as 2 incorrect bits. For example, let b1 = 100000011 and b2 = 100000101. Then HD(b1, b2) = 2, even though the only difference is that there is a 1-bit that is off by one position. Now assume that each bit in the bit string represents a quarter-note of time in a muscial pattern, where 1s are tap beats and 0s are silent beats. In this case, b1 and b2 are almost the same. A more intuitive measure for a distant between b1 and b2 would then imply that dist(b1, b2) < 2.The authors define 3 operations on bit strings that will be needed to define FHD.
Below is an introductory applet that will demonstrate how the Fuzzy Hamming distance between 2 strings is obtained via the optimal operations. Please input two bitstrings of equal length: The FHD is a 1-dimensional metric. For character recognition, we must extend these dissimilarity measures to 2-dimensional. Below, we propose 2 such means of doing so. We assume that you can only measure the distance between 2 bitmaps if they have the same number of rows and columns. FlatCrossFHD: A composite metricFlatCrossFHD (FCFHD) simply flattens the 2D bitmap by considering only its rows and columns. Therefore, given 2 bitmaps, the FCFHD compares all corresponding rows and columns in both bitmaps, each of which are bit strings, using the regular FHD. The FCFHD is the sum of all individual FHDs computed.2DFHD: An extended metricThe 2DFHD metric considers each bitmap as a finite plane of points which can take the values in {0,1}. The operations and costs are as before except that the shift operation is now allowed to move bits up and down instead of only right and left. The cost of a shift operation is precisely the Manhatten distance between positions: the minimum number of "hops" that must be taken from position (i1,j1) to position (i2,j2), where diagonal hops disallowed. As before, the 2DFHD is the optimal cost to change one bitmap to the other using only these operations. It is straight-forward to extend the dynamic programming algorithm presented in the paper to include shifting bits up and down.Procedural/Algorithmic StagesTo evaluate the value of these metrice, in practice, we have implemented a Java applet that recognizes characters based on these metrics. The algorithms used in the applet and the basic flow of how our application is modelled is described below.(1) Building the Ideal Characters BitmapsEach character in the alphabet will be represented as a 100x100 bitmap stored in a PBM file format. The characters images were generated by The Gimp [4], a common graphics imaging tool, using a 72 pt. Verdana Sans Serif font. Before any comparisons are made with the input characters, each ideal character map is first loaded from the file into an internal data structure. These bitmaps are then skeletonized using Hilditch's algorithm [5].(2) Preprocessing Stage of the Input CharactersThe purpose of this stage is to "normalize" the user input, i.e scale it to a predefined size of 100x100 bitmap. This step is crucial to be able to calculate the Fuzzy Hamming distance of the input character relative to bitmaps of the “ideal” characters.The scaling stage will be addressed shortly, but before we start this stage we need to deal with an intrinsic property to the applet design: the low number of samples of the user input. Increasing the number of Samples of the 2D bitmapsWhen the user draws a character, the applet records a finite set of points while the mouse is being dragged then connects these points by straight lines giving the illusion of sampling infinitely many points. The number of recorded points depends on the speed at which the characters are drawn. That is, if the user is drawing at a fast pace, the number of points recorded will be less than the number of points in the case of a slow pace. To handle this problem, all the points necessary to form an 8-connected straight line between any two consecutively recorded points are calculated using linear interpolation. For demonstration purposes, the points recorded by the applet will turn to red once the recognize button in the applet is pressed (see figure (1) ).![]() ![]() Figure (1): The left 'C' character was drawn slowly; whereas the right one was drawn fast Scaling 2D BitmapsWe need to scale all user input, regardless of its size, to 100x100 bitmap (see figure (2.b) ). Trying to avoid this stage by simply limiting the size of the applet window to this size will not solve the problem of normalizing the input characters because the user can still draw a miniature character. In any case, scaling is a necessary step.Scaling has been implemented using two interpolation methods inspired by [2]: nearest neighbor and bi-linear interpolation. Even though nearest neighbor is simpler, its output is not as smooth as the bi-linear interpolation. Since the pixels of the characters will be 8-connected when scaled, they will remain connected afterwards regardless of the scaling factor. The tradeoff, however, is a variation in the thickness, in terms of the number of pixels, of the resulting characters. This thickness will vary depending on the interpolation and scaling stages. To overcome this variation problem, each character will be skeletonized using Hilditch’s algorithm [5]. See figure (2.c)
(3) Recognition StageIn the recognition stage, one of two methods can be implemented. The first method applies the FCFHD metric and the other one applies the 2DFHD metric. In both cases, the procedure for recognizing the character is the same. The distance between the input character's bitmap to each ideal character's bitmap is calculated using the chosen metric.Finally, the method returns a list of distances, one for each character. The character which gives the smallest Fuzzy Hamming distance relative to the input character is the one that is chosen as the output of the recognition process. ImplementationPlease draw an UPPERCASE character into the drawing board below.OptimizationsThe ImageTools class caches loaded images for quick access. Also, a speed improvement was quickly appreciated when we realized that we could cache the skeletons of the ideal characters: this saves time because we no longer have to pass each image through Hilditch's algotirthm to obtain the skeletons before comparing the letter bitmaps.Results/RealizationsThere are several conclusions to be made about these experiements.One observation is that while these metrics may promise accuracy, they are unfortunately quite computationally expensive. This is more so for 2DFHD than FCFHD. This is impractical and makes character recognition too slow to perform in real-time. As a result, it is likely that the FHD is more suitable for 1-dimensional problems. The results given by the authors in the paper seem to support this. References[1] Fuzzy Hamming Distance: A new Dissimilarity Measure.Abraham Bookstein, Shmuel Tomi Klein, Timo Raita. [2] Rune Andre Melgaard, Mads Hansen, Chao-Ching, Mai Burner imageprocessing package. [3] Brian Rogers: hw3s rc.txt [4] The Gimp, www.gimp.org [5] Hilditch's Algorithm. COMP 644 class notes |