Linearly Separable Classes
 
 

...AN INTUITION BEHIND THE FORMULATION...

Problem 1: Determine whether disjoint pattern sets H and M are linearly separable
Problem 2: If H and M are linearly separable, find a separating hyperplane for H and M.
 

Claim: The following linear program solves problem 1 and 2. (O. L. Mangasarian, 1995)
Given H and M, find y Rh, z Rm, a Rn, b R that


(LP)

minimizes

subject to
 
 
 

 

(1/h)[y1 + y2 +...+ yh] + (1/m)[z1 + z2 +...+ zm
yi - aTHi + b + 1 for i = 1,...,h 
zj aTMj - b + 1 for j = 1,...,m 
yi 0 for i = 1,...,h 
zj 0 for j = 1,...,m 
...But where does this formulation come from? Hopefully the following will provide some insight:


What is the intuition behind using Linear Programming to separate samples?

Suppose we are given a set of n samples: x1,x2, ... ,xn and we are asked to compute a weight vector a that satisfies
aTxi bi > 0 for all i=1...n
We can formulate this problem as a linear programming problem:
Consider introducing what is called an artificial variable t such that:
aTxi + t bi
and t 0 for all i=1...n
If t is large enough, then we will be able to satisfy all these constraints. As an example, the constraints can be satisfied by choosing a = 0 and t = maxi{bi}. Although this is a solution, it doesn't help solve our original problem which is to determine a weight vector a that satisfies aTxi bi > 0 for all i.

What we desire is a solution with t = 0 (the smallest value t can have and still satisfy t 0). This leads us to consider the following problem: Minimize t over all values of t and a that satisfy the conditions:
aTxi + t bi
and t 0 for all i
If the answer yields t = 0, the samples x1,x2, ... ,xn are linearly separable, and we have a solution. If the answer yeilds t > 0 , there is no separating vector and we have proof that the samples are nonseparable.


How do we apply this knowledge to our problem?

We can now solve the problem we are interested in: Given two disjoint point sets H and M, the following linear separability criterion for the 2 pattern sets is proposed:
(LP) Minimize
Subject to



b - c
aTH b
aTM -c
and b,c 0

If and only if the above linear program has a negative minimum, then the plane aTx = (b+c)/2 separates the sets H and M. Thus aTH c and aTM b

By now you should have a pretty good idea how O. L. Mangasarian came up with the constraints LP formulation presented at the top of this page. But why does it still look so different?

The reason follows from the following:
Problem 3: If H and M are not linearly separable, find "the best" hyperplane that separates most of the patterns in H and M.

Mangasarian's LP formulation attempts to solve all three problems:
(1) Determine whether two pattern sets are linearly separable
(2) If two pattern sets are linearly separable compute a separating hyperplane
(3) Else (if the two pattern sets are not linearly separable) compute a "good" hyperplane that splits the two sets as "best" as possible.
Note: problem #3 is vague. What determines a "good" hyperplane in the case of linear inseparability? There isn't one answer to this question - this is why it is stated that Mangasarian attempts to solve it

...Click here to return to the main page (proof of #1 and #2)
...Click here to see how Mangasarian's formulation solves #3
Click here to see the main reference of this page (Duda and Hart)