• HIERARCHICAL CLUSTERING ALGORITHMS
  • There are many times when clusters have sub-classes within them, and which in turn have sub-classes of their own. Examples of such occurences can be found in biological taxonomy when we have a species in the 'animal' kingdom, with a 'chordata' phylum, a sub-phylum of 'vertebrata, and so on. Such classifications are hierarchical and proper partitions can be found by the following classification methods.

    More formally, a sequence is said to be a hierarchical clustering if there exists 2 samples, c1 and c2, which belong in the same cluster at some level k and remain clustered together at all higher levels > k.

    One example of a hiearachical clustering is a correspondence tree, or a dendrogram (shown in Figure 1), which shows how samples are grouped together. The first level shows all samples xi as singleton clusters. As we increase levels, more and more samples are clustered together in a hierarchical manner.

    Figure 1 Figure 1

    Another example is one based on sets where each cluster level may contain sets that are subclusters as shown in the Venn diagram in Figure 2.

    Figure 2 Figure 2

    The hierarchical clustering procedures can be divided into two different approaches: agglomerative and divisive. The agglomerative approach to cluster analysis, used by the nearest and farthest neighbour algorithms, is a bottom-up clumping approach where we begin with n singleton clusters and successively merge clusters to produce the other ones. The minimal spanning tree, on the other hand, uses the divisive approach which is a top-down approach where we start with one cluster and successively split clusters to produce others.

    In general, all agglomerative algorithms usually yield the same results if the clusters are compact and well separated. However, if the clusters are close to one another or if their shapes are not hyperspherical, different results can be expected. The space and time complexities are O(n2) and O(cn2d2) respectively, where c is the number of clusters and d is the distance between them.

    1. Nearest-Neighbour
    2. The following is the nearest-neighbour clustering algorithm:

      1. begin
      2.     initialize c; c' = n; Di = {xi}; i = 1,...,n
      3.     do
      4.         c' = c' - 1
      5.         Find nearest clusters Di and Dj
      6.         Merge Di and Dj
      7.     until c = c'
      8.     return c clusters
      9. end

      To find the nearest clusters in step 5, the following clustering criterion function is used:

      dmin(Di, Dj) = min || x - x' ||, where x e Di and x' e Dj

      The merging of the two clusters in step 6 simply corresponds to adding an edge betwwen the nearest pair of nodes in Di and Dj.

      Also, if instead of terminating after a predetermined number of clusters have been obtained, it is possible to set the termination criteria to stop when the distance between nearest clusters exceeds a predetermined threshold, thus becoming the single-linkage algorithm.

      Some interesting properties of this algorithm is that it will always produce a tree since edges linking clusters always go between distinct clusters, thus no cycles are produced as can be seen in Figure 3. Also, if the algorithm is allowed to run until all clusters are connected then a spanning tree is generated, or more precisely, a minimal spanning tree since the edges we are inserting between clusters are always the shortest ones in distance.

      Figure 3 Figure 3

      As a result, the nearest-neighbour algorithm can effectively detect elongated clusters but is not as effective when clusters are compact and of equal size. Therefore we look to the farthest-neighbour algorithm to detect such clusters.

    3. Farthest-Neighbour
    4. The following is the farthest-neighbour clustering algorithm:

      1. begin
      2.     initialize c; c' = n; Di = {xi}; i = 1,...,n
      3.     do
      4.         c' = c' - 1
      5.         Find nearest clusters Di and Dj
      6.         Merge Di and Dj
      7.     until c = c'
      8.     return c clusters
      9. end

      To find the nearest clusters in step 5, the following clustering criterion function is used:

      dmax(Di, Dj) = max || x - x' ||, where x e Di and x' e Dj

      Also, if instead of terminating after a predetermined number of clusters have been obtained, it is possible to set the termination criteria to stop when the distance between nearest clusters exceeds a predetermined threshold, thus becoming the complete-linkage algorithm.

      The farthest-neighbour algorithm prevents the grouping of elongated clusters; instead, clusters are composed of complete subgraphs (Figure 4). As a result, the distance of two clusters is determined not by the distance between the two closest nodes (as in the nearest-neighbour algorithm), but by the distance between the two most distant nodes. Thus, when merging the two clusters, edges are added between every pair of nodes in the affected clusters.

      Figure 4 Figure 4

      This algorithm thus detects clusters which are compact and of roughly equal size, but has difficulties when the clusters are elongated and can generate groupings which are meaningless.

      Possible compromises between the two agglomerative approaches suggested above are to use the following criterion to find the nearest clusters:

      davg(Di, Dj) = ( 1 / (ninj) ) S S || x - x' ||, where x e Di and x' e Dj

      dmean(Di, Dj) = || mi - mj ||

    5. Minimal Spanning Tree
    6. The minimal spanning tree algorithm is a divisive approach because we are given the minimal spanning tree of all the nodes thus providing us with only one cluster to work with initially.

      There exists numerous ways to divide the clusters successively. One way is to simply remove the longest edge and create 2 clusters and then recursively removing the longest edge until the desired number of clusters is obtained.

      Another method in selecting which edge to remove is to compare the length of the edges to the lengths of the edges incident upon its nodes. As a result, we can consider an edge to be inconsistent if its length l is significantly larger than l', where l' is the average length of all other edges incident on its nodes (Figure 5). This method has the advantage of determining clusters which have different distances separating one another.

      Figure 5 Figure 5

      Another use of the minimal spanning tree is to find dense clusters embedded in a sparse set of points (Figure 6). All that has to be done is to remove all edges longer than some predetermined length in order to extract clusters which are closer than the specified length to each other. If the length is chosen accordingly, the dense clusters are extracted from a sparse set of points easily.

      Figure 6 Figure 6