Authorship Determination via Linear Programming
 
 

AUTHORSHIP OF THE DISPUTED FEDERALIST PAPERS

(Bosch and Smith, 1998) - In 1787 and early 1788, a person or persons using the pseudonym Publius wrote a series of 85 editorials for the "Independant Journal", the "New York Packet," and the "Daily Adverstise" to persuade the citizens of the state of New York to ratify the U.S. Constitution. These papers are commonly known as "The Federalist" papers. Since 1788 the consensus has been that Alexander Hamilton was the sole author of 51 of the 85 papers, that John Jay was the sole author of 5, that James Madison was the sole author of 14, and the Hamilton and Madison collaborated on another three. The authorship of the remaining 12 papers has been in dispute: these papers are known as the "disputed" papers. Until 1964 it was generally agreed that the disputed papers were written by either Hamilton or Madison, but there was no consensus about which were written by Hamilton and which by Madison. In 1964 Mosteller and Wallace used statistical inference and came to the conclusion that Madison was the author of all 12 disputed papers.

Bosch and Smith verify this result using the (LP) formulation:

The first step was to obtain machine-readable texts of the The Federalist papers (available from Project Gutenberg). In addition to the The Federalist papers, 5 other papers written by Hamilton and 36 other papers written by Madison were included.

Let H = the 51 + 5 papers attributed to Hamilton
Let M = the 14 + 36 papers attributed to Madison
Let D = the 12 disputed papers

Each paper in H, M, and D were assigned a point in space using the concordance generator "Conc". This generator computed the relative frequencies for 70 function words that Mosteller and Wallace (1969) identified as good candidates for author-attribution studies. These function words are commonly used propositions, adverbs, pronouns, articles... By computing how many times, per 1000 words of text, each of the 70 function words appear, we are able to associated each paper with a point in 70-dimensional space.

118 points in 70-dimension:
Hi for i = 1...56,

the lth component of point i = the relative frequency of lth function word in the ith Hamilton paper

Mj for j = 1...50,
Dk for k = 1...12,

The two sets H and M can now be used to construct the linear program as described (click here for the details - returns to the main page).

But this would be hazardous!!

Why: Essentially, the number of points available is too small for the dimension we are dealing with. For example, imagine that we only had 1 Hamilton paper and 1 Madison paper and that we had data for only 2 function words (2-dimensional points). Solving a linear program on this problem would produce a line (a hyperplane is a line in 2D) that would separate the two 2D points. But which line? Different lines, all valid LP solutions, can yield very different conclusions about the disputed papers:

What can we do to solve this problem?
106 points in 70-dimensions is too small of a ratio. We can either increase the number of points (which would required more papers) or ignore some function word data (lower the dimension). Bosch and Smith decided to lower the dimension and used a technique called cross validation to evaluate every possible set comprised of one, two, or three of the 70 function words. This is called Feature Selection.

The Algorithm:

For each possible set (size 1,2 or 3):
Step 1: Solve the linear program using the 106 papers written by Hamilton and Madison using only the function words belonging to the set. If the value is positive (inseparability) , discard the set and stop.

Step 2: Choose an integer g 2. Use a random number generator to partition the set of 106 papers into g groups of roughly equal size.

Step 3: Solve the linear program based on all 106 papers except for those papers in group one and only for those function words that belong to the set. Form a separating hyperplane from the optimal solution and test it on the papers in group one. Record the number of papers that are classified correctly.

Step 4: Repeat Step 3 (g - 1) times (for group two, group three, ..., up to group g-1). Record the number of correct classifications each time. Finally, compute the total number of correct classifications.

Bosch and Smith ran Steps 2,3, and 4 a total of 50 times on every set that made it past Step 1. They used values of g from 2 to 5. The set consisting of the words "are", "our" and "upon" received the highest score.
Using only these 3 function words, we are able to construct our linear program with 106 points in 3-dimensions (a much better ratio!). The separating hyperplane is used to classify the points in D.

Conclusion: all 12 disputed papers were classified as to have been written by James Madison.

...Click here to return to the main page
...Click here to see another application
Click here to see the main reference of this page (Bosch and Smith)