Linearly Inseparable Classes
Proposition: (LP)
is useful in the case which H and M are linearly inseparable. {x
Rn: aTx = b*} is
a hyperplane that minimizes a certain measure of the "misclassification"
of the points of H and M. (Bosch and Smith, 1998)
Argument: Note that (LP)
is a linear programming formulation of the following unconstrained nonlinear
programming problem:
Given H and M, find a
Rn, b
R that
|
(NLP) |
minimizes
|
(1/h) max{ -aTHi + b + 1, 0}
+ (1/m) max{ aTMi - b + 1, 0}
|
Rn: aTx = b}. Thus,
the objective function can be thought of as a "hyperplane evaluator", and (NLP) finds the least
costly hyperplane. The number assigned to (a, b) is the cost of the hyperplane
{x
Rn: aTx = b*}
and solving NLP yields a least costly hyperplane.
Note: max {-aTHi + b + 1, 0} is positive
if the ith element of H belongs to the set {x
Rn: aTx < b + 1}
and is zero otherwise. In other words, max{-aTHi + b + 1, 0}
is positive if the ith point of H lies within, or on the "wrong side" of the radius-one band that
surrounds {x
Rn: aTx = b } and is zero otherwise.
Example: Examine the following figure:

| max{-aTMi + b + 1, 0} = |
0 for i = 1,...,4 1 for i = 5 3 for i = 6 4 for i = 7 |
The farther a point of H is from the right side of the band, the larger the value
of max{aTMi - b + 1, 0}. Since (NLP)
is a minimization problem, max{aTMi + b + 1, 0}
can be interpreted as the cost of the hyperplane {x
Rn: aTx = b}
misclassifying of the ith point of H, and similarly max{aTMi - b + 1, 0}
the cost of the hyperplane misclassifying the jth point of M.
Why {x
Rn: aTx = b}
is a "good" separating hyperplane:
The objective function of (NLP) is the sum of two average costs: average cost for H and the average cost
for M. Each point in H is assigned the weight 1/h. Similarly, each point in M is
assigned the weight 1/m. With these wieghts (as oppose to uniform weights over H and M)
the LP will avoid the case in which the ONLY optimal solution has a* = 0.
If a* = 0 then the hyperplane
{a*Tx = b*} is of no use.
...Click here to return to the main page
...Click here to read about minimizing the
perceptron criterion function using linear programming. This might provide
you with more insight into the linearly inseperable case.