PATTERN RECOGNITION Computer Science 308-644 B TEST #2 -March 26, 2001 (three problems) 1) Independence and Bayesian Decision Theory in the Discrete Case (a) Define what it means for features to be independent. (b) Define what it means for features to be uncorrelated. (c) Give a characterization of independence in terms of expected values. (d) Consider a multiclass pattern recognition system with M classes, C1, C2, ..., CM. Let the components of the feature vector X = (x1, x2,...,xd) be ternary valued (i.e., each feature xi takes on the values 1, 0, or -1) with the following probabilities: Pij = Prob(xi=1 | Cj), qij = Prob(xi=0 | Cj), rij = Prob(xi=-l | Cj) and with the components xi assumed to be statistically class-conditionally independent for all X = (x1, x2,...,xd) in Cj. Prove or disprove that the Bayes (optimal) decision rule in this setting is a quadratic discriminant function. 2) Separability, Learning and the Vapnik-Chervonenkis Dimension Consider a set of 2n points (patterns) in a two-dimensional feature space (x1,x2) denoted by the plane R^2. Let n of them belong to class 1 and be located in the open first and third quadrants of R^2. Let the other n points belong to class 2 and be located in the open second and fourth quadrants of R^2. In other words no points lie on the axes. (a) Prove or disprove that no matter what the distribution of the points, they are always quadratically separable in R^2. (b) Prove or disprove that you can create an additional third feature x3 such that no matter what the distribution of the points in R^2 in (a), they are always linearly separable in R^3. (c) Describe in as much detail as you can an error-correction procedure for learning a separating plane in (b) above in a finite number of steps. (d) Consider the special case of quadratic discriminant functions for the 2-class problem in R^3 where the decision boundary is a sphere. What is the Vapnik-Chervonenkis dimension of this family of classifiers? 3) NEAREST NEIGHBOR DECISION RULES Consider the 1-NN (one-nearest-neighbor decision rule) in a 2-class problem in a 2-dimensional feature space. Assume the training set X={X1,X2,...,Xn} consists of n patterns. (a) Describe in as much detail as you can a method for preprocessing and storing X so that the nearest neighbor of a new point to be classified can be found in O(logn) time in the worst case. (b) Consider now condensing X by discarding some of its points using the following rule: discard (in parallel) every point Xi, i=1,2,..,n, in X that has the property that all its Gabriel neighbors belong to the same class as that of Xi. Let Xg denote the resulting condensed set remaining. Prove or disprove that Xg is training-set-consistent. Recall that Xg is training-set consistent if the one-nearest neighbor decision rule (1-NN) that uses Xg as a classifier can classify all of X correctly.