Minimizing the Perceptron
Criterion Function
What is a perceptron? What does it have to do with pattern classification?
Click here to for a great tutorial which will answer all your questions...
How does Linear Programming fit in? (Duda and Hart)
In most pattern classification applications we cannot assume that the samples are linearly separable. When the patterns are not separable by a hyperplane, we would still like to obtain a wieght vector that classifies as many samples correctly as possible. It turns out that the number of errors (samples incorrectly classified) is not a linear function of the components of the weight vector. Thus trying to minimize the number of errors is not a linear programming problem.
The problem of minimizing the perceptron criterion function can be formulated as
a linear program. Why is this useful? Well, the minimization of the perceptron criterion function
yields a separating vector on the separable case and a "reasonably good" solution in the inseparable
case.
The basic perceptron criterion function is given by:
Jp(a)
yi Y
(-aTyi)
|
J'p(a)
yi Y'
(bi - aTyi)
|
Y' if aTyi
bi. J'p(a) is a piecewise-linear
function of a. How do we use linear programming to minimize J'p(a) if it
is not a linear function? We use the following trick: introduce n (# of samples) artificial
variables and their constraints. Consider the following Linear Program:
|
Minimize Subject to |
i=1...n si
si bi - aTyi
si 0
|
i=1...n si}
(the minimum value of our objective function) since the best we can do is take
si = max(0,bi - aTyi). Thus, if we minimize
i=1...n si over s and a we will
obtain the minimum value of J'p(a).
...Click here to return to the main page