Linearly Inseparable Classes


...THE "GOODNESS" OF A NON-SEPARATING HYPERPLANE...

Proposition: (LP) is useful in the case which H and M are linearly inseparable. {x Rn: aTx = b*} is a hyperplane that minimizes a certain measure of the "misclassification" of the points of H and M. (Bosch and Smith, 1998)


Argument: Note that (LP) is a linear programming formulation of the following unconstrained nonlinear programming problem:
Given H and M, find a Rn, b R that


(NLP)

minimizes

 

(1/h) max{ -aTHi + b + 1, 0}
+ (1/m) max{ aTMi - b + 1, 0}
Note: We will refer to the above formulation as "(NLP)"

The objective function of (NLP) assigns a nonnegative real number to each vector-scalar pair (a,b). This assignment can be viewed as the "cost" of the hyperplane {x Rn: aTx = b}. Thus, the objective function can be thought of as a "hyperplane evaluator", and (NLP) finds the least costly hyperplane. The number assigned to (a, b) is the cost of the hyperplane {x Rn: aTx = b*} and solving NLP yields a least costly hyperplane.

Note: max {-aTHi + b + 1, 0} is positive if the ith element of H belongs to the set {x Rn: aTx < b + 1} and is zero otherwise. In other words, max{-aTHi + b + 1, 0} is positive if the ith point of H lies within, or on the "wrong side" of the radius-one band that surrounds {x Rn: aTx = b } and is zero otherwise.
Example: Examine the following figure:


The point #5 of H lies within the radius-one band. Points #6 and #7 of H lie on the wrong side of the band. All the other points of H lie on the "right" side of the band:

max{-aTMi + b + 1, 0} = 0 for i = 1,...,4
1 for i = 5
3 for i = 6
4 for i = 7

The farther a point of H is from the right side of the band, the larger the value of max{aTMi - b + 1, 0}. Since (NLP) is a minimization problem, max{aTMi + b + 1, 0} can be interpreted as the cost of the hyperplane {x Rn: aTx = b} misclassifying of the ith point of H, and similarly max{aTMi - b + 1, 0} the cost of the hyperplane misclassifying the jth point of M.

Why {x Rn: aTx = b} is a "good" separating hyperplane:
The objective function of (NLP) is the sum of two average costs: average cost for H and the average cost for M. Each point in H is assigned the weight 1/h. Similarly, each point in M is assigned the weight 1/m. With these wieghts (as oppose to uniform weights over H and M) the LP will avoid the case in which the ONLY optimal solution has a* = 0.
If a* = 0 then the hyperplane {a*Tx = b*} is of no use.

...Click here to return to the main page

...Click here to read about minimizing the perceptron criterion function using linear programming. This might provide you with more insight into the linearly inseperable case.