
Independent Binary Features
As mentioned previously, in pattern recognition
many times a pattern recognition algorithm will output a feature vector
of the observed item. For instance, in the MIT reading machine for the
blind or even the cheque recognition procedure  a feature of d
dimension is output. If each feature in the vector is binary and
assumed (correctly or incorrectly) independent, a simplification of
Bayes Rule can employed:
The 2 Category Case
Here, we consider a 2category problem in which
the components of the feature vector are binaryvalued and conditionally
independent (which yields a simplified decision rule):
We also assign the following probabilities (p
and q) to each x_{i}
in X:
and
If p_{i}
> q_{i}, we expect to x_{i}
to be 1 more frequently when the state of nature is w_{1}
than when it is w_{2}.
If we assume conditional independence, we can write
P(Xw_{i})
as the product of probabilities for the components of X. The
class conditional probabilities are then:
Let’s explain the first equation. For
any x_{i}, if it equals 1, then
the expression is 1. So only is considered; which makes
sense since p_{i} is the
probability that x=1. If x_{i}=0,
then only the second term is considered, and (1p_{i})
is 1  (probability that x=1)
which is the probability that x=0.
So, for every x_{i}, the
appropriate probability is multiplied to obtain a final product.
Since this is a two class problem the discriminant
function g(x) = g_{1}(x)
 g_{2}(x)
where:
and
The likelihood ratio is therefore given by:


which yields the discriminant function as
follows:


If we notice that this function is linear in x_{i},
we can rewrite it as a linear function of x_{i}
The discriminant function g(x) will therefore
indicate whether the current feature vector belongs to class 1 and
class 2. It is important to note that w_{0}
and w_{i }are weights
calculated for the linear discriminant. A decision boundary lies
wherever g(x) = 0. This decision boundary can be a line, or hyperplane
depending upon the dimension of the feature space.


The decision boundry
g(x) = 0 is a
line on a cartesian plan for a two dimensional (d =
2) feature space.

In a three
dimensional feature space, the decision boundary g(x)
= 0 is a plane,

Higher Dimensional Case
Once there are more than two potential classes to classify the data
into, the problem becomes more difficult. The procedure above does not
yield the correct answer since this discriminant function's likelihood
is a ratio between two possible states. Therefore, the discriminant
function shown previously
must be utilized so that every g_{i}( x)
is considered. However, this approach can be used with the following
trick:
 Instead of determining whether the feature
vector belongs to classes {1, 2,
..., n} it is possible to use the above method to
determine the probability that it belongs to class i
or not.
 This is accomplished by setting g_{1}(x)
= g_{i}(x)
and g_{2}(x)=g_{(not
i)}(x). The probabilities for g_{2}(x)
can be obtained by summing all the probabilities for classes {1, ..., i1
, i+1, ...., n}. If x belongs
to class i, then g_{i}(x)
> g_{(not i)}(x);
otherwise X belongs to some other class.

