Pattern Classification Via Linear Programming

Project for CS644 - Pattern Recognition  (Prof. Godfried T. Toussaint)
Presented by  Bohdan Kaluzny
Other students projects can be found here.



Introduction

The purpose of this html document is to show how Linear Programming is used to classify patterns. Specifically, we show how to formulate the problem of finding a linear discriminant function as a problem in Linear Programming. We consider two cases: separable sets and inseparable sets.

In addition, the reader will find a number of interesting links to web pages and applets about Linear Programming and Operations Research. Among these is a nice interactive applet that shows the geometric-algebraic relationship of Linear Programming in 2 Dimensions.

Latest additions:
Links and Applets,
An intuition behind the LP formulation,
A large 2D example solved using CPLEX,
An in-depth look at the linearly inseparable case
Minimizing the Perceptron Criterion Function via Linear Programming
An Application: Authorship classification
An Application: Breast Cancer Diagnosis

 
 

The Problem

Informally: The fundamental problem we wish to address is that of distinguishing between elements of two disjoint pattern sets. A linear programming solution is attractive since many general purpose linear programming packages exist, thus requiring little or no implementation.

Formally (Mathematically): Given two disjoint point sets H and M in the n-dimensional real space Rn, we wish to construct a discriminant function f, from Rn into the real line R, such that f(H) > 0 and f(M) 0.
 

Definitions
 

Linear Programming Formulation

Let H = {H1, H2, ... , Hh} and M = {M1, ... , Mm} be two sets of disjoint points in Rn.
Let a Rn, b R.
The vector a and scalar b specify a hyperplane: {x Rn: aTx=b}

If all of H's points are on one side of {x Rn: aTx=b} , and all of M's points are on the other, we say that {x Rn: aTx=b} is a separating hyperplane for H and M.

Specifically, H and M are linearly separable if and only if H is a subset of {x Rn: aTx > b} and M is a subset of {x Rn: aTx b}. Otherwise H and M are linearly inseparable.
 
 

Problem 1: Determine whether H and M are linearly separable
Problem 2: If H and M are linearly separable, find a separating hyperplane for H and M.
 

Claim 1: The following linear program solves problem 1 and 2. (O. L. Mangasarian, 1995)
Given H and M, find y Rh, z Rm, a Rn, b R that


(LP)

minimizes

subject to
 
 
 

 

(1/h)[y1 + y2 +...+ yh] + (1/m)[z1 + z2 +...+ zm
yi - aTHi + b + 1 for i = 1,...,h 
zj aTMj - b + 1 for j = 1,...,m 
yi 0 for i = 1,...,h 
zj 0 for j = 1,...,m 
Note: We will refer to the above formulation as "(LP)"

...Click here to get a better intuition about Mangasarian's formulation

Theorem 1:
H and M are linearly separable if and only if the optimal value of (LP) is zero

Theorem 2:
If H and M are linearly separable and (y*,z*,a*,b*) is an optimal solution of (LP)
then {x Rn: aTx = b*} is a separating hyperplane for H and M

Proposition 1: (LP) is useful in the case which H and M are linearly inseparable.
{x Rn: aTx = b*} is a hyperplane that minimizes a certain measure of the "misclassification" of the points of H and M

Example 1 (linearly separable):
H = {(0,0),(1,0)} M = {(0,2),(1,2)} thus h = m = 2 (a simple 2D problem)

(LP ex.: 1)
minimize

subject to
 
 
 
 
 

 

(1/2)(y1 + y2) + (1/2)(z1 + z2)

y1 b + 1 
y2 b + 1 - a1
z1 2a2 - b + 1 
z2 a1 + 2a2 - b + 1 
y1, y2, z1, z2
[y1 y2] T - [a1 a2][0 1] + b + 1 
                                 [0 0] 

[z1 z2]T - [a1 a2][0 1] - b + 1 
                                [2 2]

An optimal solution: y1 = y2 = z1 = z2 = 0, aT = [1 -2], b = -1
Valid optimal value = 0 (linearly separable) 
[1 -2]x = -1 is the separating hyperplane (ie.: y = (1/2)x +(1/2) )


Animated GIF: wait a few seconds to see the hyperplane...

Example 2 (linearly inseparable):
H = {(0,0),(1,2)} 
M = {(0,2),(1,0)}

(LP ex.: 2)
minimize

subject to
 
 
 
 
 

 

(1/2)(y1 + y2) + (1/2)(z1 + z2)

y1 b + 1 
y2 b + 1 - a1 - 2a2
z1 2a2 - b + 1 
z2 a1 - b + 1 
y1, y2, z1, z2
An optimal solution: y1 = 4, y2 = 0, b =3, aT = [2 1], z1 = z2 = 0
Optimal value = 2 (linearly inseparable) 
[2 1]x = 3 is the proposed hyperplane


...Click here for a larger 2D example (solved using CPLEX)...

Proof of Correctness

The following proof is from Bosch and Smith, 1998

Lemma 1:
The sets H and M are linearly separable if and only if there exists a Rn and b R such that 
       aTHi b + 1 for i = 1,...,h 
and -aTMj -b + 1 for j = 1,...,m
Proof of Lemma 1:
  • First suppose that

       aTHi b + 1 for i = 1,...,h 
and -aTMj -b + 1 for j = 1,...,m 
    Then


H and M are linearly separable since all of H's points are on one side of the hyperplane {x Rn: aTx = b} and all of M's points are on the other.
 
  • Suppose H and M are linearly separable.

  • Then there exists a Rn and b R such that
    cTx > b for all x
    and cTx < b for all x of M.
    Note that min{cTx: x H} > max{cTx: x M}
    If we set 
     
    p = min{cTx: x H} - max{cTx: x M} 
    a = (2/p)c b = (1/p)(min{cTx: x of H} + max{cTx: x M}) 
    then min{aTx: x H} 
     
     
     
    = (2/p)min{cTx: x H}
    = (1/p)(min{cTx: x H} + max{cTx: x M} + p)
    = b + 1
    and max{aTx: x M} 
     
     
     
    = (2/p)max{cTx: x M}
    = (1/p)(min{cTx: x H} + max{cTx: x M} - p)
    = b - 1
    Thus,
     
    aTx b + 1 for each x of H and aTHi b + 1 for i = 1,...,h
    aTx b - 1 for each x of M and aTMj b - 1 for j = 1,...,m

Lemma 1 states that H and M are linearly separable if and only if there is a "special" separating hyperplane {x Rn: aTx=b} for H and M. The reason this hyperplane is special is that it is at least one unit from the closest point in H and at least one unit from the closest point in M: the distance from a point x to the hyperplane specified by a vector a and scalar b is define to be |aTx - b|.

Proof of Theorem 1 and 2:
Let (y*,z*,a*,b*) be an optimal solution to (LP).
The optimal value of the objective function of (LP) equals zero if and only if the following conditions hold:
(1) y* = 0       
(2) z* = 0 
(3) a*THi b* + 1 for i= 1,...,h 
(4) a*TMj b* - 1 for j = 1,...,m

Lemma 1 states that that conditions (3) and (4) hold if and only if H and M are linearly separable, in which case {x Rn: a*Tx=b*} is a separating hyperplane for H and M.
Thus, we have proven:       (Theorems 1 and 2)
  • H and M are linearly separable if and only if the optimal value of (LP) is zero.
  • If H and M are linearly separable and (y*,z*,a*,b*) is an optimal solution of (LP), then {x Rn: aTx = b*} is a separating hyperplane for H and M

 

Proposition 1: (LP) is useful in the case which H and M are linearly inseparable. {x Rn: aTx = b*} is a hyperplane that minimizes a certain measure of the "misclassification" of the points of H and M.


Argument (Bosch and Smith, 1998): Note that (LP) is a linear programming formulation of the following unconstrained nonlinear programming problem:
Given H and M, find a Rn, b R that


(NLP)

minimizes

 

(1/h) max{ -aTHi + b + 1, 0}
+ (1/m) max{ aTMi - b + 1, 0}
Note: We will refer to the above formulation as "(NLP)"

The objective function of (NLP) assigns a nonnegative real number to each vector-scalar pair (a,b). This assignment can be viewed as the "cost" of the hyperplane {x Rn: aTx = b}. Thus, the objective function can be thought of as a "hyperplane evaluator", and (NLP) finds the least costly hyperplane. What is the cost of a hyperplane for linearly inseparable sets?
...Click here to see why the generated hyperplane in the linearly inseperable case is useful

...Click here to see how linear programming minimizes the perceptron criterion function (related to the inseperable case)

Applets and Links

Java Applets:
An Interactive Applet for solving 2D Linear Programming (Graphical!) - by Bohdan Kaluzny
Note: you will need Netscape 4.x or better to use the applet (it is worth it)
Another Excellent Interactive Applet for solving 2D Linear Programming (Graphical!)
Note: you will need Netscape 4.x or better to use the applet (it is worth it)
Simplex Applet for solving LP's
Java Applet for solving Linear Programs
Simplex Pivot Tool Applet
Interactive Applet for solving 2D LP's - Graphical (University of Berkeley)
Interactive Applet for solving LP's (Any dimension)

LP Tutorials and Useful Information:
A short introduction to linear optimization
Linear Programming Glossary
Linear Programming FAQ
Myths and Counterexamples in Mathematical Programming
Non-Linear Programming FAQ
Neat article explaining the field of Operations Research
Operations Research and Related Sites
Michael Trick's Operations Research Page
Operations Research Topics
Linear Programming Solvers (No Applets, ex.: CPLEX...)

Applications
 

1. Authorship of Disputed Federalist Papers:
(Bosch and Smith, 1998) - In 1787 and early 1788, a person or persons using the pseudonym Publius wrote a series of 85 editorials for the "Independant Journal", the "New York Packet," and the "Daily Adverstise" to persuade the citizens of the state of New York to ratify the U.S. Constitution. These papers are commonly known as "The Federalist" papers. Since 1788 the consensus has been that Alexander Hamilton was the sole author of 51 of the 85 papers, that John Jay was the sole author of 5, that James Madison was the sole author of 14, and the Hamilton and Madison collaborated on another three. The authorship of the remaining 12 papers has been in dispute: these papers are known as the "disputed" papers. Until 1964 it was generally agreed that the disputed papers were written by either Hamilton or Madison, but there was no consensus about which were written by Hamilton and which by Madison. In 1964 Mosteller and Wallace used statistical inference and came to the conclusion that Madison was the author of all 12 disputed papers. Bosch and Smith verify this result using the (LP) formulation: The 73 papers whose author was known were scanned in, and relative frequencies for 70 function words were computed. The papers authored by Madison comprised the set M, and the papers authored by Hamilton, the set H. The hyperplane for (LP) was computed.... the 12 disputed all fell on Madison's side of the hyperplane... 
[ Click here to learn about this application in more detail ]

2. Medical Diagnosis:
Mangasarian, Street, and Wolberg, 1995 proposed using the (LP) formulation for breast cancer diagnosis and prognosis. Their method is currently being successfully used at the University of Wisconsin Hospital. The method involves extracting a fluid sample via a fine needle from the patient's breast lump/mass. The fluid is placed on a glass slide and stained to highlight the nuclei of constituent cells. The image is transferred to a program (XCYT) written by the authors mentioned above. The image is analyzed, and the following features are extracted for each nucleus: area, radius, perimeter, symmetry, number and size of concavities, compactness, smoothness, etc... The features comprise a 30-Dimensional vector. The next step was to compute this 30-Dimensional vector for patients known to have benign lumps and also for patients known to have malignant lumps (Thus, we obtain sets H and M). The separating hyperplane (or next best thing) is computed using (LP), yielding a discriminant function for classification of unknown samples. The success rate of this procedure is claimed to be 97.5%.
[ Click here to learn about this application in more detail ]



References
 
 
[1] R. O. Duda, P. E. Hart,David G. Stork, Pattern Classification, John Wiley Interscience, 2000
[2] O. L. Mangasarian, W. N. Street, and W. W. Wolberg, "Breast Cancer Diagnosis and Prognosis Via Linear Programming," Operations Research. 43 (1995), 570-577
[3] O. L. Mangasarian, R. Setiono, and W. W. Wolberg, "Pattern Recognition Via Linear Programming: Theory and Applications to Medical Diagnosis," (1990)
[4] O. L. Mangasarian, "Linear and Non-linear Separation of Patterns by Linear Programming," Operations Research, 13, 444-452 (May-June 1965)
[5] R. C. Grinold, "Comment on 'Pattern Classification Design by Linear Programming,'"IEEE Trans. Comp. (Correspondence), C-18, 378-379 (April 1969)
[6] R. C. Grinold, "A Note on Pattern Separation," Operations Research, 18, (1970), pp. 187-190
[7] F. W. Smith, "Pattern Classifier Design by Linear Programming," IEEE Trans. on Comp.,C-17, 367-372 (April 1968)
[8] F. W. Smith, "Design of Multi-category Pattern Classifiers with Two-Category Classifier Design Procedures," IEEE Trans. Comp., C-18, 548-551 (June 1969)
[9] R. C. Grinold, "Mathematical Programming methods of pattern classification," Management Science, 19 (1972), pp. 272-289.
[10] N. Karkarkar, "A new polynomial time algorithm for linear programming," Combinatorica, 4, (1984), pp. 373-395.
[11] L. G. Khachian, "A polynomial algorithm in linear programming," Dokl. Akad. Nauk SSR 244, 5,(1979), pp.1093-1096: Soviet Math. Doklady, 20, (1979), pp.191-194
[12] V. Chvatal, "Linear Programming," W. H. Freeman and Company, 1983
[13] J. O'Rourke, "Computational Geometry in C 2nd Ed.," Cambridge University Press, 1998.
[14]
 
R. A. Bosch, J. A. Smith, "Separating Hyperplanes and the Authorship of the Disputed Federalist Papers," American Mathematical Monthly, Volume 105, Number 7, pp. 601-608
[15]
 
R. F. Mosteller, D. L. Wallace, "Inference and Disputed Authorship: The Federalist," Addison-Wesley, Reading, Mass., 1964