Project for CS644 - Pattern Recognition (Prof. Godfried T. Toussaint)
Presented by Bohdan Kaluzny
Other students projects can be found here.
Introduction
The purpose of this html document is to show how Linear Programming is used to classify patterns. Specifically, we show how to formulate the problem of finding a linear discriminant function as a problem in Linear Programming. We consider two cases: separable sets and inseparable sets.
In addition, the reader will find a number of interesting links to web pages and applets
about Linear Programming and Operations Research. Among these is a nice interactive applet
that shows the geometric-algebraic relationship of Linear Programming in 2 Dimensions.
Latest additions:
Links and Applets,
An intuition behind the LP formulation,
A large 2D example solved using CPLEX,
An in-depth look at the linearly
inseparable case
Minimizing the Perceptron Criterion Function via Linear Programming
An Application: Authorship classification
An Application: Breast Cancer Diagnosis
The Problem
Informally: The fundamental problem we wish to address is that of distinguishing between elements of two disjoint pattern sets. A linear programming solution is attractive since many general purpose linear programming packages exist, thus requiring little or no implementation.
Formally (Mathematically): Given
two disjoint point sets H and M in the n-dimensional real space Rn,
we wish to construct a discriminant function f,
from Rn into the real line R, such that f(H)
> 0 and f(M)
0.
Definitions
b and f(x1,x2,...,xn)
b are called linear inequalities.Linear Programs have the following
standard form:
| maximize
subject to:
|
cjxj
aijxj bi for i=1,2...,m
xj 0 for j = 1,2,...,n |
objective function
constraints
|
where are aij and cj are real numbers. xj are variables. m is the number of constraints, n the number of variables.
![]() |
Linear Programming Formulation
Let H = {H1, H2,
... , Hh} and M = {M1, ... , Mm}
be two sets of disjoint points in Rn.
Let a
Rn,
b
R.
The vector a and scalar b
specify a hyperplane: {x
Rn:
aTx=b}
If all of H's points are on one side
of {x
Rn: aTx=b}
, and all of M's points are on the other, we say that {x
Rn: aTx=b} is a separating
hyperplane for H and M.
Specifically, H and M are linearly
separable if and only if H is a subset of {x
Rn: aTx > b} and M is a subset of {x
Rn: aTx
b}. Otherwise
H and M are linearly inseparable.
Problem 1: Determine whether
H and M are linearly separable
Problem 2: If H and M are
linearly separable, find a separating hyperplane for H and M.
Claim 1: The following linear program solves problem 1 and 2. (O. L. Mangasarian, 1995)
Given H and M, find y Rh, z
Rm, a Rn, b
R that |
|
(LP) |
minimizes subject to
|
(1/h)[y1
+ y2 +...+ yh] + (1/m)[z1 + z2
+...+ zm]
yi - aTHi + b + 1 for i = 1,...,h
zj aTMj - b + 1 for j = 1,...,m
yi 0 for i = 1,...,h
zj 0 for j = 1,...,m |
Theorem 1:
| H and M are linearly separable if and only if the optimal value of (LP) is zero |
Theorem 2:
| If H and M are linearly separable and (y*,z*,a*,b*)
is an optimal solution of (LP),
then {x Rn: aTx
= b*} is a separating hyperplane for H and M |
Proposition 1: (LP) is useful in the case which H and M are linearly inseparable.
{x
Rn: aTx = b*} is a hyperplane
that minimizes a certain measure of the "misclassification" of the points
of H and M |
Example 1 (linearly separable):
| H = {(0,0),(1,0)} M = {(0,2),(1,2)} thus h = m = 2 (a simple 2D problem) |
|
(LP ex.: 1) |
minimize
subject to
|
(1/2)(y1 + y2) + (1/2)(z1
+ z2)
|
[y1 y2]
T
- [a1 a2][0 1] + b + 1
[0 0] [z1 z2]T |
| An optimal solution: y1 = y2
= z1 = z2 = 0, aT = [1 -2], b = -1
Valid optimal value = 0 (linearly separable) [1 -2]x = -1 is the separating hyperplane (ie.: y = (1/2)x +(1/2) ) |
![]() |
Example 2 (linearly inseparable):
| H = {(0,0),(1,2)}
M = {(0,2),(1,0)} |
|
|
(LP ex.: 2) |
minimize
subject to
|
(1/2)(y1 + y2) + (1/2)(z1
+ z2)
|
| An optimal solution: y1 = 4, y2 = 0,
b =3, aT = [2 1], z1 = z2 = 0
Optimal value = 2 (linearly inseparable) [2 1]x = 3 is the proposed hyperplane |
Proof of Correctness
The following proof is from Bosch and Smith, 1998
Lemma 1:
The sets H and M are linearly separable if and only if there exists
a
Rn and b R such that
|
|
|||||||||||
Then there exists a Rn and b
R such that
H} > max{cTx:
x M}
|
Lemma 1 states that H and M are linearly
separable if and only if there is a "special" separating hyperplane {x
Rn: aTx=b} for H and M. The
reason this hyperplane is special is that it is at least one unit from
the closest point in H and at least one unit from the closest point in
M: the distance from a point x to the hyperplane specified by a vector
a and scalar b
is define to be |aTx - b|.
![]() |
Proof of Theorem 1 and 2:
| Let (y*,z*,a*,b*)
be an optimal solution to (LP).
The optimal value of the objective function of (LP) equals zero if and only if the following conditions hold:
Lemma 1 states that that conditions (3) and (4) hold if and only if H and M are linearly separable, in which case {x
Rn: a*Tx=b*} is a separating
hyperplane for H and M.
Thus, we have proven: (Theorems 1 and 2)
|
Proposition 1: (LP)
is useful in the case which H and M are linearly inseparable. {x
Rn: aTx = b*} is
a hyperplane that minimizes a certain measure of the "misclassification"
of the points of H and M.
Argument (Bosch and Smith, 1998): Note that (LP)
is a linear programming formulation of the following unconstrained nonlinear
programming problem:
Given H and M, find a
Rn, b
R that
|
(NLP) |
minimizes
|
(1/h) max{ -aTHi + b + 1, 0}
+ (1/m) max{ aTMi - b + 1, 0}
|
Rn: aTx = b}. Thus,
the objective function can be thought of as a "hyperplane evaluator", and (NLP) finds the least
costly hyperplane.
What is the cost of a hyperplane for linearly inseparable sets?
Applets and Links
Java Applets:
An Interactive Applet for solving 2D Linear Programming (Graphical!) - by Bohdan
Kaluzny
Note: you will need Netscape 4.x or better to use the applet (it is worth it)
Another Excellent Interactive Applet for solving 2D Linear Programming (Graphical!)
Note: you will need Netscape 4.x or better to use the applet (it is worth it)
Simplex Applet for solving LP's
Java Applet for solving Linear Programs
Simplex Pivot Tool Applet
Interactive Applet for solving 2D LP's - Graphical (University of Berkeley)
Interactive Applet for solving LP's (Any dimension)
LP Tutorials and Useful Information:
A short introduction to linear optimization
Linear Programming Glossary
Linear Programming FAQ
Myths and Counterexamples in Mathematical Programming
Non-Linear Programming FAQ
Neat article explaining the field of Operations Research
Operations Research and Related Sites
Michael Trick's Operations Research Page
Operations Research Topics
Linear Programming Solvers (No Applets, ex.: CPLEX...)
Applications
1. Authorship of Disputed Federalist Papers:
| (Bosch and Smith, 1998) - In 1787 and early 1788, a person
or persons using the pseudonym Publius wrote a series of 85 editorials
for the "Independant Journal", the "New York Packet," and the "Daily Adverstise"
to persuade the citizens of the state of New York to ratify the U.S. Constitution.
These papers are commonly known as "The Federalist" papers. Since 1788
the consensus has been that Alexander Hamilton was the sole author of 51
of the 85 papers, that John Jay was the sole author of 5, that James Madison
was the sole author of 14, and the Hamilton and Madison collaborated on
another three. The authorship of the remaining 12 papers has been in dispute:
these papers are known as the "disputed" papers. Until 1964 it was generally
agreed that the disputed papers were written by either Hamilton or Madison,
but there was no consensus about which were written by Hamilton and which
by Madison. In 1964 Mosteller and Wallace used statistical inference and
came to the conclusion that Madison was the author of all 12 disputed papers.
Bosch and Smith verify this result using the (LP) formulation:
The 73 papers whose author was known were scanned in, and relative frequencies
for 70 function words were computed. The papers authored by Madison comprised
the set M, and the papers authored by Hamilton, the set H. The hyperplane
for (LP) was computed.... the 12 disputed all fell on Madison's side of
the hyperplane...
|
2. Medical Diagnosis:
|
Mangasarian, Street, and Wolberg, 1995 proposed using the (LP) formulation
for breast cancer diagnosis and prognosis. Their method is currently being
successfully used at the University of Wisconsin Hospital. The method involves
extracting a fluid sample via a fine needle from the patient's breast lump/mass.
The fluid is placed on a glass slide and stained to highlight the nuclei
of constituent cells. The image is transferred to a program (XCYT) written
by the authors mentioned above. The image is analyzed, and the following features
are extracted for each nucleus: area, radius, perimeter, symmetry, number
and size of concavities, compactness, smoothness, etc... The features comprise
a 30-Dimensional vector. The next step was to compute this 30-Dimensional
vector for patients known to have benign lumps and also for patients known
to have malignant lumps (Thus, we obtain sets H and M). The separating
hyperplane (or next best thing) is computed using (LP), yielding a discriminant
function for classification of unknown samples. The success rate of this
procedure is claimed to be 97.5%.
|
References