This page presents an applet illustrating the role of Occam's Razor (actually the Rule of Simplicity, as defined on the interpretations page) in choosing a decision boundary based on limited sample data.
The IdeaThe applet challenges you to draw a two-class decision boundary, and see how your rule does against the Bayes-optimal and nearest-neighbor rules. A set of training points are displayed, picked from some distribution (either set by the user or picked by the applet and concealed). Based on these points, you are invited to draw the decision boundary which you think will best classify future data points. Then the Bayes-optimal and nearest-neighbor decision rules are calculated and displayed, and a number of further points are plotted, showing the relative success of the three boundaries. You will have trouble beating the Bayes-optimal boundary, but can you beat nearest-neighbor?
The Applet
Using the Applet
The challenge here, of making the best guess at a decision boundary based on limited data, is a fundamental problem of pattern recognition, in some sense the fundamental problem. In principle the Bayes-optimal boundary is the best possible and cannot be beaten, so we would like our guess to approach it. Unfortunately the Bayes-optimal boundary requires explicit knowledge of the distributions followed by the data points, which is rarely available a priori. The nearest-neighbor boundary is simple to calculate, does not rely on knowledge of the distributions, and is usually not far from the Bayes-optimal boundary. Often you will find that your best guess at a decision boundary is similar in shape to the nearest-neighbor boundary; the nearest-neighbor idea has a strong intuitive pull.
However, one feature of the nearest-neighbor rule is counterintuitive. Because it is constructed from a series of interconnecting lines (from the Voronoi diagram), the nearest-neighbor boundary often features sharp corners where its segments intersect. As you can see by experimenting with different distributions, the Bayes-optimal boundary for classes following simple distributions never has these sharp corners, and you are likely to find that the boundaries you draw are smooth as well. Again, there is something intuitive about smooth decision boundaries, perhaps because so much of our experience of the world is with classes following simple distributions. This is an example of the sort of simplifying intuition that is captured by the Rule of Simplicity interpretation of Occam's Razor.
If you compare the boundaries you draw with those produced by the nearest-neighbor rule, you will often find that your guess has combined these two intuitions: user boundaries typically resemble smoothed or "simplified" versions of the nearest-neighbor boundary. Is this a valid, if unquantitative, improvement? Run through several trials and see how often you can beat the nearest-neighbor rule. Try varying your "smoothing factor": do you do better if you smooth only slightly (producing a slightly rounded version of nearest-neighbor) or heavily (approaching a straight line)?
NotesThe applet is slow due to the brute-force approach used to calculate the boundaries, especially the nearest-neighbor boundary. Speed could probably be greatly improved by calculating the Voronoi diagram of the sample points and using this to calculate the nearest neighbors of each new data point. This is left as an exercise for the reader.