Cette page est aussi disponible en français.

Project for Computer Science 308-644B


Chrislain Razafimahefa
Mike Soss

Unsupervised learning occurs when classifications must be made about data without a priori knowledge, such as a training set, or even knowledge of how many classes truly exist.

One of the methods used to perform such classifications is clustering with minimal spanning trees (MST). In this method, we group all of the points together by a MST. Then, we can group all vertices that are close together as members of the same class, or in other words, consider vertices which are joined by long edges as members of different classes.

How long is a long edge?
This is unfortunately one of the inherent problems of clustering with minimal spanning trees; like many clustering algorithms it requires a parameter to be set. We use the following method to determine if an edge is too long.
T is the parameter, the tolerance for long edges.


Interactive Clustering Demonstration

Here you can play with data points to see how the clustering is affected.
You can adjust following parameters: A possible improvement to the applet that may be added in the future is a choice of distance metrics. In other words, how do we measure the length of edges? Two useful metrics for this purpose are the Euclidean metric and the Mahalanobis metric.

In addition, you can choose to display the long edges of the MST (shown in cyan) or hide them by clicking on the "Toggle showing MST" button.

And of course, click on the vertices and move them around!

(Left mouse button adds a vertex or moves an existing one. Right mouse button deletes a vertex.)



Traversing a tree is quick and easy; it can be done in linear time. The most time-consuming step of the classification is the construction of the minimal spanning tree.

A MST of n points in the plane can be constructed in Theta(n log n) time with the aid of a Delaunay triangulation. (To see why this is an optimal time bound, reduce sorting to MST.)

However, our applet never has to compute a MST given only the n points from scratch. Because our applet is dynamic, a user can add or delete points to the existing tree. Thus we use an online algorithm for calculating a MST.

Follow this link to read the algorithm.
The applet source code is also available.
Back to Chrislain's main page.
Back to Mike's main page.