PATTERN RECOGNITION
Computer Science 308-644 B
TEST #2 -March 26, 2001 (three problems)

1) Independence and Bayesian Decision Theory in the Discrete Case

(a) Define what it means for features to be independent.

(b) Define what it means for features to be uncorrelated.

(c) Give a characterization of independence in terms of expected values.

(d) Consider a multiclass pattern recognition system with M classes,
C1, C2, ..., CM.  Let the
components of the feature vector X = (x1, x2,...,xd) be ternary valued 
(i.e., each feature xi
takes on the values 1, 0, or -1) with the following probabilities:

Pij = Prob(xi=1 | Cj), qij = Prob(xi=0 | Cj), rij = Prob(xi=-l | Cj)

and with the components xi assumed to be statistically class-conditionally 
independent for
all X = (x1, x2,...,xd) in Cj. Prove or disprove that the Bayes (optimal) 
decision rule in this
setting is a quadratic discriminant function.


2) Separability, Learning and the Vapnik-Chervonenkis Dimension
Consider a set of 2n points (patterns) in a two-dimensional feature space 
(x1,x2) denoted
by the plane R^2.  Let n of them belong to class 1 and be located in the
open first and third
quadrants of R^2. Let the other n points belong to class 2 and be located
in the open second
and fourth quadrants of R^2. In other words no points lie on the axes.

(a) Prove or disprove that no matter what the distribution of the 
points, they are always quadratically separable in R^2.

(b) Prove or disprove that you can create an additional third feature 
x3 such that no matter
what the distribution of the points in R^2 in (a), they are always
linearly separable in R^3.

(c) Describe in as much detail as you can an error-correction procedure 
for learning a separating plane in (b) above in a finite number of steps.

(d) Consider the special case of quadratic discriminant functions for 
the 2-class problem in
R^3 where the decision boundary is a sphere. What is the
Vapnik-Chervonenkis dimension of this family of classifiers?


3) NEAREST NEIGHBOR DECISION RULES

Consider the 1-NN (one-nearest-neighbor decision rule) in a 2-class
problem in a 2-dimensional feature space.  Assume the training set
X={X1,X2,...,Xn} consists of n patterns.

(a) Describe in as much detail as you can a method for preprocessing and
storing X so that the nearest neighbor of a new point to be classified can
be found in O(logn) time in the worst case.

(b) Consider now condensing X by discarding some of its points using the
following rule: discard (in parallel) every point Xi, i=1,2,..,n, in X
that has the property that all its Gabriel neighbors belong to the same
class as that of Xi.  Let Xg denote the resulting condensed set remaining.
Prove or disprove that Xg is training-set-consistent.  Recall that Xg is
training-set consistent if the one-nearest neighbor decision rule (1-NN)
that uses Xg as a classifier can classify all of X correctly.