Stylometric - Synoptic Results

Previous page: Stylometrics And The Synoptic Problem

Note: The data used in the analysis below can be found in this Excel spreadsheet, also linked at the bottom of this page. In particular, the correlations are located between rows 37 and 150, and columns HI to LR.

Authorial Profiles

One of the basic assumptions underlying this method of analysis is that different authors (aA, aB, and aC) have natural frequency profiles pA, pB, and pC that are sufficiently different that one can be distinguished from the other. Therefore, it is reasonable to test this assumption before continuing with the rest of the analysis. Because we are testing what copying took place from one synoptic gospel to another, we cannot compare the profiles of the whole of each gospel against each other. Instead, we need to isolate the text unique to each author and compare the profiles of each.

One way of doing this is to compare each author’s Sondergut material (HHBC categories 200, 020, and 002 respectively) against the other two. Alternatively, we can combine all the categories containing words written by just one of the three authors, e.g. c200 + c210 + c211 + c201 (=c2AA), and compare them. In each case there are three possible pairs of authors to compare:

p200 – p020 = 0.24 p2AA – pA2A = -0.20
p020 – p002 = -0.23 pA2A – pAA2 = -0.34
p200 – p002 = 0.13 p2AA – pAA2 = -0.02

These results confirm that there is no significant correlation (the correlation coefficients are much closer to 0 than to either -1 or 1) between the profiles of the words written by any pair of authors, i.e. indicating that the profiles of the three authors are sufficiently different that one can be distinguished from another. The differences between the two groups of tests, with the correlations in the second group being more negative than those in the first, are to be expected. This is because the second group includes individual words in one of the synoptics that are not used in parallels in either of the other two.

However, the values do suggest that the differences between the profiles of Mark (020/A2A) and Luke (002/AA2) are somehow of a different character (they are more negative) than between those of the other pair of synoptics. This suggests that the Greek used in Mark is different to that used in Luke, in a way that does not apply to the other synoptic pairings. It may be that, as has been suggested, this is due to aMark not having Greek as his first language, but whatever the reason, it is the case that Luke contains many systematic language differences from Mark, for example there are around 150 places where Mark has the narrative present while Luke has the past tense. Systematic differences such as this would tend to create a negative correlation between Mark and Luke such as we see here.

Matthew - Mark

Testing for homogeneity within the categories representing either Matthew, Mark, or Luke may provide useful information regarding the order in which the synoptics were written. For example, if Mark was written first then we should expect many of the HHBC categories that contain text from Mark (all those with a ‘2’ in the middle position of their identifier, e.g. 020, 121) to have similar profiles (providing that the whole of Mark was actually written by the same author, who did not just copy from other texts).

Then, if aMatthew copied or edited some text from Mark, Matthew would contain text from two different authors, and would therefore be likely to be less homogenous than Mark. Finally, if aLuke copied or edited text from both Mark and Matthew, then Luke would be likely to be less homogenous still. Therefore, by comparing the homogeneity of Matthew, Mark, and Luke in turn we may be able to determine which of the authors copied or edited from which others.

The HHBC data representing each of Matthew, Mark, and Luke is spread across 9 categories, depending on the interaction with the other two synoptics. For example, we can compare those parts of Matthew that have no parallels in Mark (c 200 + c201 + c202 = c20X) with those parts that do (c210 + c211 + c212 + c220 + c221 + c222 = c2NX), but also we can compare those parts of Matthew that have no parallels in Luke (c2X0) with those that do (c2XN). As a result, we can compare the categories grouped in various different ways to determine how the relationships between the synoptics affect which parts of any one synoptic, which show signs of homogeneity, and which do not.

Homogeneity of Matthew vs. Matthew-Mark Parallels

The tests in the group below all compare the profiles of categories representing passages in Matthew that do not have parallels in Mark (c20X, with ‘sub-divisions’ c20N, etc.) with those that do have parallels in Mark (c2NX, c2NN, etc.). We can perform comparisons using six different combinations of categories in Matthew, depending on the existence and content of parallel passages in Luke.

A similar group of tests can then be used to compare the profiles of the same categories representing passages in Matthew that do not have parallels in Mark, with those representing just those words in Matthew not used in the parallels in Mark (c21X, c21N, etc.).

We can also compare the profiles of those categories that denote passages in Matthew that either have no parallels in Mark, or where the parallels have different words (c2AX, c2AN, etc.), with those where the parallels contain the same words (c22X, c22N, etc.)

The above results show that there is no strong correlation in any of these three groups of tests. However, p201 and p211 (respectively, the profiles of the Double and the Triple Tradition words only in Matthew) do appear to be sufficiently similar (0.46) to be worth investigating further. An examination of a scatter plot of the two profiles shows that several words in these categories have similar below average frequencies in each. As both categories contain words only in Matthew, in passages that have parallels in Luke, this suggests that the frequency with which these words appear in both categories has been affected by copying/editing between Matthew and Luke.

Overall, there is no evidence here of homogeneity between those passages in Matthew with parallels in Mark and those without, i.e. there is no evidence that the passages in Matthew that have parallels in Mark came from the same source as the passages in Matthew that have no parallels in Mark. More specifically, where Matthew and Mark have identical words in parallel passages, there is no evidence that the words originated in Matthew.

Homogeneity of Mark vs. Matthew-Mark Parallels

The following groups of tests can be considered to be the ‘reverse’ of the previous groups, this time testing for homogeneity in Mark instead of in Matthew. The tests in the first two groups below compare the profiles of categories representing passages in Mark that do not have parallels in Matthew (c02X, c02N, etc.) with those representing:

Passages in Mark that do have parallels in Matthew (cN2X, cN2N, etc.),
Words in passages in Mark that do not exist in the parallel passages in Matthew (c2X1, c2N1, etc.).

These results show that the passages in Mark that have no parallels in Matthew have very similar profiles to those passages in Mark that do have parallels in Matthew.

These results are very similar to those in the previous group, showing that the passages in Mark that have no parallels in Matthew have very similar profiles to just those words in the Matthew-Mark parallels that are in Mark but not Matthew. These two groups of results are a strong indication that the words common to Matthew and Mark originated in Mark.

The only area where categories in Mark show little signs of similarity are where Mark and Luke share identical words (c022 + c122 + c222 = cX22). Further investigation shows that this is primarily because the profile of the Mark-Luke agreements against Matthew (p122) is not similar to any other profile, while p022 and p222 are both similar to p220. This could indicate that the words in c122 came from a source outside the synoptics. However, p122 does have negative correlations with a number of other categories, the most significant being with p2AX (All of Matthew except for words shared with Mark). This is due in no small measure to AUTON appearing more frequently in c122 than in any other category, while Matthew uses AUTON much less.

We can also compare the profiles of categories that denote passages in Mark with either no parallels in Matthew, or where the parallels have different words (cA2X, cA2N, etc.), with those that contain the same words (c22X, c22N, etc.)

Unlike the two previous groups of tests, here there is much less indication of homogeneity. However, this is in large part due to the fact that cA2X, cA2N, and cA22 all include c122, and c122 (Triple tradition words in Mark and Luke but not Matthew) is not similar to any other category, as reported above:

cA2X = c022 + c122 + c021 + c121 + c020 + c120
cA2N = c022 + c122 + c021 + c121
cA22 = c022 + c122

If results containing c122 are excluded we can see that the remaining results do show a consistent homogeneity, although quite not as strong as the earlier ones.

Overall, the above results show a great deal of homogeneity in Mark, but a lack of it in Matthew. This is a strong indicator that the passages shared between Matthew and Mark originated in Mark. The lack of similarity between p22X and p12X does not affect this indication, but instead just provides information about the choices made by aMatthew when copying/editing from aMark. The lack of similarity between p122 and the profiles of any other categories (in any of the synoptics) has a similar cause, but in this case it suggests that the sharing of words between Mark and Matthew was later affected by the sharing of words between Mark and Luke.

Source of Identical Words in Matthew-Mark Parallels

As previously noted, one of the key indicators of directionality is the possible correlation between the profiles of categories containing words common to two of the synoptics and words unique to one or the other. There are four basic tests that can be used to look for ‘authorship’ of words common to any pair of the synoptics (e.g. c2XX in the case of Matthew-Mark). However, two of these tests have already been used when testing for homogeneity, leaving just two to test here. Both look for similarity between an author’s Sondergut material, and the words he has in common with another author:

There are then six variations on each of the above, depending on the existence and content of any parallels in Luke (X, N, A, 0, 1, 2), giving the following tests:

As previously noted, if whatever copying/editing took place included selectively choosing or replacing many individual words (rather than complete sentences), then the profile of the words common to any two of the synoptics may not have a strong correlation with the profile of the words in passages unique to one or the other, and that may be the case here.

Although the differences between the results of these two groups of tests are not great, what differences there are suggest that it is more likely that Mark was first, i.e. that the words common to both Matthew and Mark came from Mark. The differences between these two groups of results are greatest for c221, i.e. Triple Tradition words common to Matthew and Mark but not Luke (difference = 0.37 – 0.06 = 0.31).

Editing Choices in Matthew-Mark Parallels

The following tests can only provide limited (if any) information on directionality. However, using the directionality information from the previous tests, they may provide additional information on how the source material of the Matthew-Mark parallels was copied/edited.

The previous results have indicated that the source material came from Mark, so we can use that as an assumption in the following tests. As with the homogeneity tests, there are six variations on each of the above two tests, as follows:

There are no significant positive correlations here, and thus nothing to refute the assumption that the Matthew-Mark parallels originated in Mark.

The results of these groups of tests are not conclusive as to the mix of individual words vs. complete sentences that were copied or replaced. However, such indications as do exist support the conclusions of the tests for homogeneity, which are that the passages shared between Matthew and Mark came from Mark, and also that aMatthew mainly selected or rejected complete sentences from Mark, but did change or add many individual words as well.

The greatest indication of Matthew changing individual words comes from the 'Matthew-Mark double tradition' (c120 + c220 + c210 = cNN0), where we see:

p220 - p210 = -0.48
p220 - p120 = 0.32

This material (that is not in Luke) includes what is known as ‘The Great Omission’ from (approximately) Mark 6:47a - 8:27b, as well as the death of John the Baptist and some other items. In this material the relative frequency of use of various words varies greatly between Mark and Matthew, in particular the use of IHSOUS:

c120 (Mark) 55% below average
c220 (Mark) 52% below average
c210 (Matthew) 122% above average

Here we see that the material that aMatthew chooses not to use (c120) contains the word IHSOUS a relatively small number of times, whereas in the material he adds (c210) he uses IHSOUS frequently.

Matthew - Luke

Homogeneity of Matthew vs. Matthew-Luke Parallels

The following tests compare the profiles of categories representing passages in Matthew that do not have parallels in Luke (c2X0, c2N0, etc.) with those representing:

Passages in Matthew that do have parallels in Luke (c2XN, c2NN, etc.).
Words in passages in Matthew that do not exist in the parallel passages in Luke (c2X1, c2N1, etc.).

Here we see quite different results from those in the Matthew–Mark comparisons, with some significant correlations in Matthew between the profiles of passages with parallels in Luke, and those without. The strongest correlation occurs when comparing Sondergut Matthew (c200) with the Double Tradition passages in Matthew (c20N), i.e. where there are no parallels in Mark. This in itself is a strong indication that the Double Tradition did not originate in Luke.

The correlation is nearly as strong when comparing Sondergut Matthew with just those words from Double Tradition passages that are in Matthew but not Luke (c201), suggesting that c201 consists mainly of complete sentences, rather than just a selection of individual words, i.e. that the Double Tradition was created largely by selecting and copying complete sentences, and changing relatively few individual words.

However, when looking at the categories corresponding to the Triple Tradition passages in Matthew (c211 + c221 + c222 + c212 = c2NN), the correlations indicate that the copying/editing between Matthew and Luke involved changing a much greater number of individual words. This is particularly so for the words also shared with Mark (c221 + c222 = c22N):

c220 – c22N = 0.51
c220 – c221 = 0.05
c220 – c222 = 0.67

Here we can see that the selection and rejection of particular words by aLuke has ‘split’ c22N into c221 and c222, so that p220 is not similar to p221, while it is similar to p222. The following group of tests compare categories on either side of this split:

As with the previous two groups of tests, the strongest correlation again relates to the Double Tradition. We have:

p200 – p20N = 0.75
p200 – p201 = 0.69
p20A – p202 = 0.73

The first and last of these three comparisons ‘overlap,’ as both include c201:

p200 – p20N = p200 – p(c201 + c202)
p20A – p202 = p(200 + 201) - p202

This trio of correlations, all of which are positive irrespective of whether c201 is paired with c200 or c202 (i.e. with words only in Matthew or in both Matthew and Luke), indicate that c201 has a very similar profile to both c200 and c202. This indicates that most of the words in all three categories most likely came from the same source. In addition, as c200 (Sondergut Matthew) contains complete sentences, c201 and c202 must also each largely consist of complete sentences.

This evidence supports the view that the Double Tradition material originated in Matthew, and that words common to Matthew and Luke were mainly re-used in Luke in the form of complete sentences, with only a small percentage of the words from Matthew being removed or replaced by aLuke in the process.

With regard to passages in Matthew that are part of the Triple Tradition (Words in all three gospels: c211 + c221 + c222 + c212), we have:

p220 – p22N = 0.51
p22A – p222 = 0.63

These comparisons also overlap, with both including c221, so that:

p220 – p22N = p220 - p(221 + 222)
p22A – p222 = p(220 + 221) - p222

This is similar to the case of c200, c201, and c202 (above), and therefore there is good reason to believe that c220, c221 and c222 all came from the same source, which (from the previous Mark–Matthew comparisons) is most likely to have been Mark. However, in this case more of the copying/editing was performed by selecting (or not) individual words rather than complete sentences.

Homogeneity of Luke vs. Matthew-Luke Parallels

The following tests all compare the profiles of categories representing passages in Luke that do not have parallels in Matthew (c0X2, c0N2, etc.) with those representing passages in Luke that do have parallels in Matthew (cNX2, cNN2, etc.).

Although these two groups of tests appear to indicate that all the passages in Luke that do not have parallels in Matthew have similar profiles to all those that do (p0X2 – pNX2 = 0.52), the correlation is actually only significant where Luke does not share words with either Matthew or Mark (p0A2 – p1A2 = 0.70). This indicates that the only parts of Luke that are homogenous are those categories containing words unique to Luke, which in turn suggests that the words in Luke common to either Matthew or Mark did not come from the same source as the words unique to Luke.

The most interesting result here is the negative correlation between pAN2 and p2N2 (-0.48). Examining the scatter plots of these and other categories shows that this is mainly due to there being a number of words with above average frequencies in c1N2 (which is part of cAN2) that have below average frequencies in c2N2:

p1N2 – p2N2 = -0.47

As previously demonstrated, this ‘splitting’ of words between adjacent categories is caused by another person selecting (or not) these words to use in another text. Here c1N2 contains words in Luke that were not used when some text from Luke was edited for use in Matthew, while c2N2 contains the words that were re-used in Matthew, i.e. there is evidence of some copying/editing from Luke (or possibly an earlier version of Luke) to Matthew in passages common to all three synoptics (i.e. in the Triple Tradition).

Source of Identical Words in Matthew-Luke Parallels

The following tests check for correlations between the profiles of c2X2 (and variations c2N2, c2A2, etc.) and the equivalent Matthean and Lukan categories.

Unlike the equivalent tests for the ‘ownership’ of the identical parallels in Matthew and Mark, there are significant differences between the results of these two groups of tests, indicating that many of the words shared with Luke originated in Matthew. The evidence is strongest for the words in the Double Tradition:

c202 – c200 = 0.69
c202 – c002 = 0.18

However, the evidence is less strong for the words in the Triple Tradition. In particular, for the words that are identical in all three synoptics (c222), there may have been movement in both directions. However, because c222 is the only category that contains words common to all three synoptics, we should take account of results involving c222 and all three synoptics at the same time, and also take account of any relationships between c220, c022, and c202 (words shared between any pair of the synoptics):

p222 – p220 = 0.67 p220 – p022 = 0.56
p222 – p022 = 0.40 p022 – p202 = -0.09
p222 – p202 = 0.36 p202 – p220 = 0.29

These results show that p220 (words common to Matthew and Mark only) and p022 (words common to Mark and Luke only) are similar, indicating that both categories contain words mainly originating in the same source, i.e. Mark. This is the reason that p222 is similar to both p220 and p022, indicating in turn that c222 also contains words mainly originating in Mark.

Editing Choices in Matthew-Luke Parallels

The previous results indicate that the words in the Double Tradition (c202) common to both Matthew and Luke most likely originated in Matthew, while the origin of the words in Matthew-Luke agreements against Mark (c212) is uncertain. The following tests may help to clarify this.

There is little evidence of editing choices here, except in the case of the Double Tradition (c202), where the results support the previous evidence that indicates that the Double Tradition consists mainly of sentences originating in Matthew.

The Double Tradition result in this group of tests appears to contradict previous results, since p202 is similar to both p201 and p102, suggesting that c202 originated in both Matthew and Luke. The key indicators here are:

p200 – p201 = 0.69 p200 – p202 = 0.69 p201 – p202 = 0.66
p002 – p102 = 0.29 p002 – p202 = 0.18 p102 – p202 = 0.52

These results show that both p201 and p202 (Double Tradition words in Matthew) are very similar to p200 (Sondergut Matthew), but neither p102 nor p202 (Double Tradition words in Luke) are similar to p002 (Sondergut Luke).

As p102 is not similar to p002, but is similar to p202, and p202 is similar to p200, it is reasonable to suppose that p102 might also be similar to p200. However, this is not actually the case. Instead:

p200 – p102 = 0.29

Thus, although p102 is similar to p202, it is not similar to the profiles of either Sondergut Matthew or Sondergut Luke, suggesting that at least some of c102 might have come from another source (i.e. not Matthew, Mark, or Luke), which I will call S. However, if so, because p102 is similar to p202, then the whole of the Double Tradition has some relationship with S.

More specifically, some of the Double Tradition text may have originated in S (in addition to Matthew, as previously suggested). However, because p200 (Sondergut Matthew) is very similar to both p201 and p202, any text originating in S must have been edited by aMatthew before any of it was used within Luke. Then, after aMatthew had made his mark on the text, aLuke added his own changes, leaving p102 still similar to p202, but not similar to Sondergut Luke.

Mark - Luke

Homogeneity of Mark vs. Mark-Luke Parallels

The following tests all compare the profiles of categories representing passages in Mark that do not have parallels in Luke (cX20, cN20, etc.) with those representing passages in Mark that do have parallels in Luke (cX2N, cN2N, etc.).

On the assumption that aLuke copied/edited text from Mark (as suggested above), we would expect to see significant correlations in these tests. However, although some do exist, they do not form a clear pattern. In particular, there is some evidence that the ‘Mark-Luke double tradition’ (i.e. c021 and c022, where there is no sharing with Matthew), may have originated in Mark, and some evidence that the Triple Tradition words common to all three synoptics (c222) also originated in Mark, but little else. Overall, there are enough correlations greater than 0.4 to suggest that Mark was the source of the words common to Mark and Luke (with Luke changing many individual words), and nothing to suggest otherwise.

Homogeneity of Luke vs. Mark-Luke Parallels

The following tests all compare the profiles of categories representing passages in Luke that do not have parallels in Mark (cX02, cN02, etc.) with those representing passages in Luke that do have parallels in Mark (cXN2, cNN2, etc.).

There is little evidence here of any significant degree of correlation between categories in Luke. The exception is cA02 (Sondergut Luke + Double Tradition words only in Luke), where we have:

- pA02 – pAN2 = 0.63

pA02 – pA12 = 0.71

Because cAN2 = cA12 + cA22, it appears as though the relationship between pA02 and pA12 is the main ‘driver’ behind these two correlations. However, as both these categories are themselves the combination of two other categories, we need to look at other comparisons between the various components of cA02 and cAN2 to uncover more detail. The components are:

cA02 = c002 + c102
cAN2 = c012 + c022 + c112 + c122 = cA12 + cA22 = c0N2 + c1N2

The detailed comparisons then are:

p002 – p012 = 0.32 p102 – p012 = -0.05 pA02 – p012 = 0.27

p002 – p022 = -0.33 p102 – p022 = -0.27 pA02 – p022 = -0.36

p002 – p112 = 0.74 p102 – p112 = 0.27 pA02 – p112 = 0.73

p002 – p122 = 0.08 p102 – p122 = -0.16 pA02 – p122 = 0.03

p002 – pA12 = 0.73 p102 – pA12 = 0.23 pA02 – pA12 = 0.71

p002 – pA22 = -0.08 p102 – pA22 = -0.25 pA02 – pA22 = -0.14

p002 – p0N2 = 0.07 p102 – p0N2 = -0.24 pA02 – p0N2 = 0.00

p002 – p1N2 = 0.68 p102 – p1N2 = 0.20 pA02 – p1N2 = 0.66

From this we can see that the key relationship is the strong correlation between p002 and p112, because all the other strong correlations shown involve categories that include both c002 and c112. This is a strong indication that the words in c002 (Sondergut Luke) and c112 (Triple Tradition words unique to Luke) came from the same source. In addition, because c002 (by definition) contains complete sentences, this indicates that c112 also consists largely of complete sentences rather than individual words or phrases.

However, because p012 does not have a strong correlation with p002, again a little more investigation is needed:

p012 – p002 = 0.32
p012 – p112 = 0.42
p012 – p022 = -0.32

From these results we can see that c012 is most likely to come from the same source as c002 and c112. However, the correlations indicate that p012 contains a higher percentage of individual words, which is consistent with the view that aLuke significantly altered the Greek of the passages he took from Mark that are not in Matthew.

The lack of correlation between p002 and both p022 and p122 suggests that words shared between Mark and Luke did not come from Luke, but they came from Mark instead.

Source of Identical Words in Mark-Luke Parallels

The following tests check for correlations between the profiles of cX22 (and variations cN22, cA22, etc.) and the equivalent Markan and Lukan categories.

These tests support previous results suggesting that the main source of the words common to all three synoptics was Mark. However, they do not provide any strong indication of the sources of the words common to Mark and Luke but not Matthew, i.e. those in c022 and c122.

Editing Choices in Mark-Luke Parallels

The lack of any significant correlations in the first group of tests above and the negative correlations in the second group suggest that the words common to both Mark and Luke could have originated in Luke. However, this appears to contradict previous results and others (below), that suggest that the words in Mark that are common to either Matthew or Luke originated in Mark:

pX22 – p22A = 0.58
pX22 – p220 = 0.62
pN22 – p220 = 0.55
p022 – p220 = 0.56

The key to understanding this problem is knowledge of the differences between the Greek used in Mark and that used in Luke, in particular that Luke ‘corrects’ or ‘improves’ the Greek used in Mark. For example, cX22 contains above average use of EIPON, with below average use in cX12, while the converse is true of EIS. As previously mentioned, this causes a negative correlation between the profiles of many of the categories in Mark when compared with categories in Luke.

Next: Conclusions

Page updated

Google Sites

Report abuse