Two Authors

Previous page: Stylometric Analysis

Text Categories

Consider the following scenarios regarding two pieces of text on the same basic subject, A and B, where A is written first and B is created later (Note: Referring to B as being created rather than written allows for copying or editing from A, and/or independent creative writing). For the purpose of this exercise, let us make the basic assumption that the two authors (call them aA and aB) have natural profiles pA and pB that are sufficiently different that one can be distinguished from the other. There are three cases to consider, depending on the actions of aB:

We can therefore see that when two people author text on the same subject, the similarities between the profiles of the texts will vary, depending on how much one person copies from or edits the text of the other. Now suppose that A and B each contain several passages (i.e. strings of words forming whole or part of a description, an event, a story, etc.), and that, as before, the whole of A is written before B. A includes complete passages not included in B, and vice versa. In addition, the two texts include common passages, where B is copied or edited from A. The text in A and B can then be split into the following five categories:

A New Notation

At this point it is helpful to introduce a new notation for each of the above five categories. Each category is identified by a two-digit number, with the first digit representing text in A, and the second digit representing text in B, where:

Therefore:

Categories c20, c21, and c22 together contain all the text in A, and for convenience we can identify the combination of these three categories as c2X, when the ‘X’ indicates any value (0, 1, or 2) for the text in B. Similarly, c22, c12, and c02 contain all the text in B, and this combination can be identified as cX2. These relationships are shown diagrammatically in this Two Person Venn Diagram .

Common Passages: Categories c21, c22, and c12

Categories c21, c22, and c12 are all created by the action of aB deciding to copy, edit, add to, or simply not use words or sentences from passages in A. For example, suppose that A contains a passage including the sentence; “The brown fox jumps over the lazy dog,” and that aB includes a variant of it in B:

Then, all the words in A and B can be assigned to one of the categories c21, c22, and c12:

It is important to realize that although all the words in c21 and c22 were originally written by aA, aA has no influence over how the words are distributed across these two categories. The distribution of these words, and all the words in c12, are determined solely by the actions of aB. The categories only exist because aB decided to include in B (for whatever reason) edited versions of passages that were already in A. It is worth noting that:

It is perhaps more likely that aB would choose not to use some sentences from A, re-use some with no change, edit others, and add some of his own. In this case c21 will contain a mixture of complete sentences and individual words from A not used by aB, c22 will contain a mixture of complete sentences and individual words from A re-used by aB, and c12 will contain complete sentences and individual words added by aB.

Next: Correlations between Profiles