Cette page est aussi disponible en
français.
Project for Computer Science 308-644B
Chrislain Razafimahefa
Mike Soss
Unsupervised learning occurs when classifications must be made about data without a priori knowledge, such as a training set, or even knowledge of how many classes truly exist.
One of the methods used to perform such classifications is clustering with minimal spanning trees (MST). In this method, we group all of the points together by a MST. Then, we can group all vertices that are
close together as members
of the same class, or in other words, consider vertices which are joined by long edges as members of different classes.
How long is a long edge?
This is unfortunately one of the inherent problems of clustering with minimal spanning trees; like many clustering algorithms it requires a parameter to be set. We use the following method to determine if an edge is too long.
An edge E = {i,j} is too long if
- E is longer than T * the mean length of the incoming edges at vertex i, or
- E is longer than T * the mean length of the incoming edges at vertex j.
T is the parameter, the tolerance for long edges.
Interactive Clustering Demonstration
Here you can play with data points to see how the clustering is affected.
You can adjust following parameters:
- The number of data points
- The tolerance for long edges
A possible improvement to the applet that may be added in the future is a choice of distance metrics. In other words, how do we measure the length of edges? Two useful metrics for this purpose are the Euclidean metric and the Mahalanobis metric.
In addition, you can choose to display the long edges of the MST (shown in
cyan) or hide them by clicking on the "Toggle showing MST" button.
And of course, click on the vertices and move them around!
(Left mouse button adds a vertex or moves an existing one. Right mouse button deletes a vertex.)
Traversing a tree is quick and easy; it can be done in linear time. The most time-consuming step of the classification is the construction of the minimal spanning tree.
A MST of n points in the plane can be constructed in
(n log n) time with the aid of a Delaunay triangulation. (To see why this is an optimal time bound, reduce sorting to MST.)
However, our applet never has to compute a MST given only the n points from scratch. Because our applet is dynamic, a user can add or delete points to the existing tree. Thus we use an online algorithm for calculating
a MST.
Follow this link to read the
algorithm.
The applet source code is also available.
Back to Chrislain's main page.
Back to Mike's main page.