Authorship Determination via Linear Programming
Bosch and Smith verify this result using the (LP) formulation:
The first step was to obtain machine-readable texts of the The Federalist papers (available from Project Gutenberg). In addition to the The Federalist papers, 5 other papers written by Hamilton and 36 other papers written by Madison were included.
Let H = the 51 + 5 papers attributed to Hamilton
Let M = the 14 + 36 papers attributed to Madison
Let D = the 12 disputed papers
Each paper in H, M, and D were assigned a point in space using the concordance generator "Conc". This generator computed the relative frequencies for 70 function words that Mosteller and Wallace (1969) identified as good candidates for author-attribution studies. These function words are commonly used propositions, adverbs, pronouns, articles... By computing how many times, per 1000 words of text, each of the 70 function words appear, we are able to associated each paper with a point in 70-dimensional space.
118 points in 70-dimension:
Hi for i = 1...56,
The two sets H and M can now be used to construct the linear program as described
(click here for the details - returns to the main page).
Why: Essentially, the number of points available is too small for the dimension
we are dealing with. For example, imagine that we only had 1 Hamilton paper and 1 Madison
paper and that we had data for only 2 function words (2-dimensional points). Solving a
linear program on this problem would produce a line (a hyperplane is a line in 2D) that would
separate the two 2D points. But which line? Different lines, all valid LP solutions, can
yield very different conclusions about the disputed papers:

What can we do to solve this problem?
106 points in 70-dimensions is too small of a ratio. We can either increase the
number of points (which would required more papers) or ignore some function
word data (lower the dimension). Bosch and Smith decided to lower the dimension and used
a technique called cross validation to evaluate every possible set comprised of one,
two, or three of the 70 function words. This is called Feature Selection.
The Algorithm:
For each possible set (size 1,2 or 3):
Step 1: Solve the linear program using the 106 papers written by Hamilton and Madison
using only the function words belonging to the set. If the value is positive (inseparability)
, discard the set and stop.
Step 2: Choose an integer g
2. Use a random number generator
to partition the set of 106 papers into g groups of roughly equal size.
Step 3: Solve the linear program based on all 106 papers except for those papers in group one and only for those function words that belong to the set. Form a separating hyperplane from the optimal solution and test it on the papers in group one. Record the number of papers that are classified correctly.
Step 4: Repeat Step 3 (g - 1) times (for group two, group three, ..., up to group g-1). Record the number of correct classifications each time. Finally, compute the total number of correct classifications.
Bosch and Smith ran Steps 2,3, and 4 a total of 50 times on every set that made it past Step 1. They used
values of g from 2 to 5. The set consisting of the words "are", "our" and "upon" received the highest score.
Using only these 3 function words, we are able to construct our linear program with 106 points in 3-dimensions
(a much better ratio!). The separating hyperplane is used to classify the points in D.
Conclusion: all 12 disputed papers were classified as to have been written by James Madison.
...Click here to return to the main page
...Click here to see another application
Click here to see the main reference of this page (Bosch and Smith)