# Facts about Erdös Numbers and the Collaboration Graph

The following interesting facts about the collaboration graph and Erdös numbers are mostly based on information in the database of the American Mathematical Society’s *Mathematical Reviews* (MR) as of July, 2004. Internet access to MR data is provided by the service MathSciNet. We gratefully acknowledge the assistance of the AMS in making this information available. An article with much of the information contained in this page appears in *Geographical Analysis*.

*A different standard is used for collaborations here than is used in constructing our Erdös-1 and Erdös-2 lists. First, for our lists we use sources in addition to Mathematical Reviews; the conclusion on this page are based just on the MR data. Second, we generally do not count articles that are not the result of research collaboration as establishing a link. For example, if Jack and Jill wrote a joint obituary article on Humpty Dumpty when he died, that article might appear in the MR database and establish a link between Jack and Jill for the conclusions reached on this page, whereas the traditional definition of the collaboration graph would not suggest putting an edge between them just on this basis. There are also a few author identification problems in the Math Reviews database (primarily prior to 1985), which make the conclusions here only approximate.*

**Data on the entire collaboration graph**

There are about **1.9 million authored items** in the Math Reviews database, by a total of about **401,000 different authors**. (This includes all books and papers in MR except those items, such as some conference proceedings, that do not have authors.) **Approximately 62.4% of these items are by a single author, 27.4% by two authors, 8.0% by three authors, 1.7% by four authors, 0.4% by five authors, and 0.1% by six or more authors.** The largest number of authors shown for a single item is in the 20s, but sometimes the author list includes “et al.”, whom we did not count as a real person. **The fraction of items authored by just one person has steadily decreased over time, starting out above 90% in the 1940s and currently standing at under 50%.**

Let B be the bipartite graph whose vertices are papers and authors, with an edge joining a paper with each author of that paper. Then B has about **2.9 million edges**. The **average number of authors per paper is 1.51**, and **the average number of papers per author is 7.21**. Here is the distribution of the number of papers per author. The median is 2, the mean is 7.21, and the standard deviation is 18.02. It is interesting (for tenure review committees?) to note that the 60th percentile is 3 papers, the 70th percentile is 4, the 80th percentile is 8, the 90th percentile is 18, and the 95th percentile is 32. Indeed, **over 42% of all authors in the database have just one paper.**

There are **four authors with more than 700 papers**: Paul Erdös with 1416 (he actually wrote more papers than that, but these are just the ones covered by Math Reviews), Drumi Bainov with 823, SAHARON SHELAH with 760, and Leonard Carlitz with 730. Bainov’s Erdös number is 4, SHELAH’s is 1, and Carlitz’s is 2. The other mathematicians with more than 500 papers listed in MathSciNet (and their Erdös numbers) are Hari M. Srivastava (2), Lucien Godeaux (infinite — actually he wrote only one joint paper), Ravi Agarwal (3), Edoardo Ballico (3), FRANK HARARY (1), Josip E. Pecaric (2), Shigeyoshi Owa (3), and Richard Bellman (2). The most prolific authors listed in the DBLP (dealing with computer science publications) can be found on a list at their website (DBLP), which is definitely worth exploring.

The **collaboration graph** C has the roughly 401,000 authors as its vertices, with an edge between every pair of people who have a joint publication (with or without other coauthors — but see below for a discussion of “Erdös number of the second kind”, where we restrict links to just two-author papers). Here is a picture of a small portion of this graph. The entire graph has about **676,000 edges**, so the **average number of collaborators per person is 3.36**. (If we were to view C as a multigraph, with one edge between two vertices *for each paper in which they collaborated*, then there would be about 1,300,000 edges, for an average of 6.55 collaborations per person.) In C there is one large component consisting of about 268,000 vertices. Of the remaining 133,000 authors, **84,000 of them have written no joint papers** (these are **isolated vertices** in C). The average number of collaborators for people who have collaborated is 4.25; the average number of collaborators for people in the large component is 4.73; and the average number of collaborators for people who have collaborated but are not in the large component is 1.65.

We also provide data on the number of collaborators per author (in other words, the **numbers of coauthors mathematicians have**). In graph-theoretical terms, this table shows the **degrees of the vertices in C**. The median is 1, the mean is 3.36, and the standard deviation is 6.61. (If we omit the isolated vertices, then the median degree is 2, and the mean is 5.37.) Recent research (see our Research on Collaboration in Research page) has indicated that we should expect the nonzero degrees to follow a power law: **the number of vertices with degree x should be proportional to x raised to a power, where the exponent is somewhere around –2 or –3**. Indeed, when we fit such a model to our data from May, 2000 (grouping the data in the tail), we find the exponent to be about –2.97, with a correlation coefficient for the model of r = 0.97. A slightly more accurate model throws in an exponential decay factor, and with this factor present, the exponent is –2.46, and r = 0.98. Apparently these models are appropriate for our data.

The five **people with more than 200 coauthors** are Paul Erdös (of course) with 509 (although the MR data actually show only 504, missing some coauthors of very minor works or works before 1940, when MR was started), FRANK HARARY (Erdös number 1) with 268, Yuri Alekseevich Mitropolskii (Erdös number 3) with 244, NOGA ALON (Erdös number 1) with 227, and Hari M. Srivastava (Erdös number 2) with 244.

We also display information on publication habits over time (1940 to 1999). It is clear from these data that collaboration has increased over the past 60-odd years, especially so recently. **By 2000, less than half of all mathematics papers were by a single author, about a third were by two authors, about an eighth by three authors, and 3% by four or more authors.** The table also indicates that the average number of papers per author in a decade has slowly increased over time, now standing at about 5 (although the variance is very large, and the median is only 2).

The **radius** of the large component of C (as it existed in Mathematical Reviews data as of July, 2004) is 12, and its **diameter** is 23. There are exactly two vertices with eccentricity 12 — Izrail M. Gelfand (Rutgers University) and Yakov Sinai (Princeton University), both of whom have Erdös number 3 — but not including Paul Erdös! (In other words, there is no one with Gelfand number or Sinai number greater than 12, whereas the maximum Erdös number is 13. In all, 1220 people have eccentricity 13.) Erdös does have the distinction of having the smallest mean distance to the other vertices, though: 4.65. There are five other people with means less than 5. In order of increasing mean, they are RONALD GRAHAM, ANDREW ODLYZKO, NOGA ALON, Larry Shepp, and FRANK HARARY. All of them have eccentricity 14 and Erdös number 1 except for Shepp, whose eccentricity is 13 and whose Erdös number is 2. The means for Gelfand and Sinai are slightly higher than 5.

Based on a sample of 100 pairs of vertices in this component, the average distance between two vertices is around 7.64 (between 7.41 and 7.87 with 95% confidence), with a standard deviation of about 1.19. The median of the sample was 7, with the quartiles at 6 and 8. The smallest and largest distances in the sample were 4 and 11, respectively. The appropriate phrase for C, then, is perhaps “eight degrees of separation”, if we wish to account for three quarters of all pairs of mathematicians.

To analyze this another way, we took a sample of 100 vertices in the large component and computed for each of them: the degree, the mean distance to all the other vertices, the standard deviation of the distances to all the other vertices, and the maximum distance to another vertex (the “eccentricity”). Here are the results **from the sample**. The mean distance to other vertices varied from 5.80 to 10.67, with an average of 7.37 and a standard deviation of 0.86. The standard deviation of the distances to all the other vertices was remarkably constant, with the numbers varying only between 1.14 and 1.28 (mean 1.19, standard deviation 0.03). So although the *average* “Jane Doe” number varies quite a bit, depending on who Jane Doe is, the *distribution* of these numbers has pretty much the same shape and spread for everyone. It’s as if those people further away from the heart of the graph may take longer to get to the heart, but once there, the fan-out pattern is the same. The eccentricities of the vertices in the sample ranged from 14 to 19, with a mean of 15.62 and a standard deviation of 1.04. We also looked at correlations among Erdös number (n), vertex degree (d), and average distance to the other vertices (l). The associations are as one might predict: The correlation coefficient between d and n is –0.46 (people with a lot of collaborators tend to have smaller Erdös number); the correlation coefficient between d and l is –0.56 (people with a lot of collaborators tend to have shorter paths to other people); and the correlation coefficient between n and l is 0.78 (people with a small Erdös number are closer to the heart of the graph and therefore have shorter paths to others, compared to those out in the fringes).

The “**clustering coefficient**” of a graph is equal to the fraction of ordered triples of vertices a,b,c in which edges ab and bc are present that have edge ac present. (In other words, how often are two neighbors of a vertex adjacent to each other?) The clustering coefficient of the collaboration graph of the first kind is 1308045/9125801 = 0.14. The high value of this figure, together with the fact that average path lengths are small, indicates that **this graph is a “small world” graph** (as defined by Duncan Watts — see our pages on Research on Collaboration in Research and Items of Interest Related to Erdös Numbers).

We also have some data on the portion of the collaboration graph **outside the “Erdös component”** (the one giant component). We are ignoring here the 84,000 isolated vertices and looking only at those authors who have collaborated but do not have a finite Erdös number. There are about 50,000 such vertices. There are about 41,000 edges in these components, so the average degree of these vertices is 1.65. In other words, a person who has collaborated but does not find herself in the Erdös component of C has on the average collaborated with only one or two people. In contrast, the average degree of vertices in the Erdös component is 4.73 (there are about 634,000 edges and 268,000 vertices). Look at the distribution of component sizes. As would be expected, **most of these roughly 18,000 other components are isolated edges** (64% of them, in fact). The largest component has 32 vertices. Its most collaborating author is Yu. A. Shevlyakov (Department of Applied Mathematics, Simferopol State University, Crimea, Ukraine), who has 13 coauthors. The person outside the Erdös component with the most coauthors is Gholam Reza Jahanshahloo (Department of Mathematics, University for Teacher Education, Tehran, Iran), who is in a component with 23 vertices (he has collaborated with all but two of them).

**Smaller collaboration graphs**

It would be interesting to see how much collaboration goes on within one department. In the Department of Mathematics and Statistics at Oakland University there seems to be quite a bit. Here are a pdf file of their collaboration graph in 2004 and 2012. If other departments produce such a graph, please send the link to me, and I will list them here.

**The distribution of Erdös numbers**

The following table shows the number of people with Erdös number 1, 2, 3, ..., according to the electronic data. Note that there are slightly fewer people shown here with Erdös numbers 1 and 2 than in our lists, since our lists are compiled by hand from various sources in addition to MathSciNet. In addition to these 268,000 people with finite Erdös number, there are about 50,000 published mathematicians who have collaborated but have an infinite Erdös number, and 84,000 who have never published joint works (and therefore of course also have an infinite Erdös number).

Erdös number 0 --- 1 person

Erdös number 1 --- 504 people

Erdös number 2 --- 6593 people

Erdös number 3 --- 33605 people

Erdös number 4 --- 83642 people

Erdös number 5 --- 87760 people

Erdös number 6 --- 40014 people

Erdös number 7 --- 11591 people

Erdös number 8 --- 3146 people

Erdös number 9 --- 819 people

Erdös number 10 --- 244 people

Erdös number 11 --- 68 people

Erdös number 12 --- 23 people

Erdös number 13 --- 5 people

Thus the **median Erdös number is 5**; the **mean is 4.65**, and the **standard deviation is 1.21**.

One of the five people with the **largest finite Erdös number** is Arturo Robles, and one shortest path goes like this (year of joint work in parenthese): Erdös to Daniel D. Bonar (1977) to Charles L. Belna (1979) to S. A. Obaid (1983) to Wadie A. Bassali (1981) to Ibrahim H. M. el-Sirafy (1976) to Konstantin Chernous (1977) to Jose Valdes (1980) to B. Dugnol (1980) to P. Suarez Rodriguez (1995) to A. E. Alvarez Vigil (1995) to C. Gonzalez Nicieza (1992) to Jose Angel Huidobro (1986) to Robles (1990).

Since Paul Erdös collaborated with so many people, one would expect this distribution for him to be shifted downward from that of a random mathematician. For example, “Jerry Grossman numbers” have a median of 6, a mean of 5.71 (standard deviation = 1.22), and range as high as 15; and “Arturo Robles numbers” have a median of 15, a mean of 15.06 (standard deviation = 1.21). It turns out that the standard deviation is almost exactly the same for almost everyone in the large component.

**Erdös numbers of the second kind**

The entire discussion so far has been based on linking two mathematicians if they have written a joint paper, whether or not other authors were involved. A purer definition of the collaboration graph (in fact, the one that Paul Erdös himself seemed to favor) would **put an edge between two vertices if the mathematicians have a joint paper by themselves, with no other authors**. Under this definition, for example, YOLANDA DEBOSE would not have an Erdös number of 1, since her only joint publication with Erdös was a three-author paper with ARTHUR M. HOBBS as well. (But HOBBS would still have Erdös number 1, since some of his joint works are with Paul alone.) Let C' denote the collaboration graph under this more restrictive definition, and let us call the associated path lengths “Erdös numbers of the second kind” (and therefore call traditional Erdös numbers “Erdös numbers of the first kind” when we need to make a distinction).

**Here is what we know about C' and Erdös numbers of the second kind.** This two-author-only collaboration graph has about 166,000 isolated vertices (including the 84,000 people who have written no joint papers, together with another 83,000 people who have written joint papers but only when three or more authors were involved — these numbers all rounded to the nearest thousand). The remaining 235,000 mathematicians in C' account for about 284,000 edges, so the average degree of a nonisolated vertex in C' is about 2.41 (as opposed to 4.25 for C). Here are data on the distribution of these degrees, i.e., the number of collaborators per author counting only dual works. The median is 1, the mean is 1.34, and the standard deviation is 2.84. (If we omit the isolated vertices, then the median degree is still 1, the mean is 2.41, and the standard deviation is 3.37.) As with the collaboration graph of the first kind, we should **expect the nonzero degrees to follow a power law**, and when we fit this a model to our data from May 2000 (again grouping the data in the tail), we find the exponent to be about –3.26, with a correlation coefficient for the model of r = 0.97. The model with an exponential decay factor present gives the exponent as –2.70, with r = 0.98.

The **three people with 100 or more coauthors of this type** are Paul Erdös (of course) with 230, FRANK HARARY with 124, and SAHARON SHELAH with 121. HARARY’s only papers with Erdös are 3-author works, so his Erdös number of the second kind is 2 (through BOLLOBAS, for example); SHELAH’s is 1.

**There are about 176,000 vertices in the large component of C' (versus 268,000 in C).** The average number of two-author-only collaborators for people in the large component is 2.82; and the average number of two-author-only collaborators for people who have written two-author papers but are not in the large component is 1.21.

The **radius** of the large component of C' (as it existed in Mathematical Reviews data as of July, 2004) is 14. The unique center is J. Bryce McLeod (whose Erdös numbers, of both kinds, are 3), and not Paul Erdös, whose eccentricity is 15, as is the eccentricity of 392 other people. The **diameter** of C' is 26 (this is the distance between the two people with Erdös number of the second kind equal to 15). As is the case with the collaboration graph of the first kind, Erdös has the distinction of having the smallest mean distance to the other vertices, 5.58, and no one else has a mean less than 6.

As in the case of C, we took a sample of vertices in the large component of C' and computed for each of them: the degree, the mean distance to all the other vertices, the standard deviation of the distances to all the other vertices, and the maximum distance to another vertex (the “eccentricity”). Here are the results **from the sample of 100 vertices**. The mean distance to other vertices varied from 6.87 to 11.99, with an average of 9.18 and a standard deviation of 1.19. (Thus a 95% confidence interval for the average distance between vertices is 8.95 to 9.42.) The standard deviation of the distances to all the other vertices was again remarkably constant, with the numbers varying only between 1.48 and 1.63 (mean 1.54, standard deviation 0.034). The eccentricities of the vertices in the sample ranged from 15 to 21, with a mean of 18.21 and a standard deviation of 1.32. As for the correlations among Erdös number (n), vertex degree (d), and average distance to the other vertices (l), the correlation coefficient between d and n is –0.41; the correlation coefficient between d and l is –0.48; and the correlation coefficient between n and l is 0.86.

The **clustering coefficient** of the collaboration graph of the second kind is 48132/1738599 = 0.028. This is actually a fairly high value (compared to a random graph with this density of edges, where the clustering coefficient is essentially 0), so again we have a “small world” graph. (The reason it is so much smaller than the clustering coefficient for the collaboration graph of the first kind is that the multi-author collaborations create a lot of triangles.) The three mathematicians with at least 25 two-author collaboration pairs among their collaborators whose collaborators most collaborate with each other are Masatoshi Fujii, Masahiro Nakamura, and Jian She Yu, each with about 30 two-author collaborators and local clustering coefficients in the 11% to 13% range — these are the only ones above 10%. (In other words, for these people, about 12% of the pairs of their two-author collaborators have themselves written a two-author paper. In fact, Fujii and Nakamura are adjacent in C'.)

We also have some data on the portion of the collaboration graph of the second kind **outside the “Erdös component” (the one giant component).** We are ignoring here the 166,000 isolated vertices and looking only at those authors who have written two-author papers but do not have a finite Erdös number of the second kind. There are about 59,000 such vertices. There are about 36,000 edges in these components, so the average degree of these vertices is 1.21. (In contrast, the average degree of vertices in the Erdös component is 2.82 (there are about 248,000 edges and 176,000 vertices). Here is the distribution of component sizes. As would be expected, **most of these roughly 23,000 other components are isolated edges (three fourths of them, in fact).** The largest component has 28 vertices.

**The distribution of Erdös numbers of the second kind**

The following table shows the number of people with Erdös number 1, 2, 3, ..., according to the electronic data but counting only coauthorships on papers with just two authors. In addition to these 176,000 people with finite Erdös number of the second kind, there are about 59,000 mathematicians who have collaborated but have an infinite Erdös number of the second kind (this is about 9,000 greater than the corresponding number for Erdös numbers of the first kind).

these are Erdös numbers of the second kind

Erdös number 0 --- 1 person

Erdös number 1 --- 230 people

Erdös number 2 --- 2153 people

Erdös number 3 --- 10118 people

Erdös number 4 --- 28559 people

Erdös number 5 --- 47430 people

Erdös number 6 --- 44102 people

Erdös number 7 --- 25348 people

Erdös number 8 --- 11265 people

Erdös number 9 --- 4299 people

Erdös number 10 --- 1570 people

Erdös number 11 --- 533 people

Erdös number 12 --- 206 people

Erdös number 13 --- 61 people

Erdös number 14 --- 25 people

Erdös number 15 --- 2 people

Thus the **median Erdös number of the second kind is 5**; the **mean is 5.58**, and the **standard deviation is 1.55**, a little higher than the corresponding statistics for Erdös numbers of the first kind, as would be expected. The two people with maximum Erdös number of the second kind Sunil Kumar-2 and N. V. Silenok.

**Paul Erdös asked the following question: Is the collaboration graph of the second kind planar? Our guess was that surely it was not, and we now have a proof**. If we can find a homeomorphic copy of the complete graph on five vertices in C', or a copy of the complete bipartite graph with three vertices in each part, then we know that the graph cannot be imbedded in a plane. A natural place to look for such subgraphs would be in a portion of the graph where there are lots of edges. The following concept, apparently introduced not by graph theorists but by sociologist, proved fruitful.

The “*k*-core” of a graph is the (unique) largest subgraph all of whose vertices have degree at least *k*. (See the article in *Social Networks* discussed on the Research on Collaboration in Research page for references to the notion of core.) It is easy to find the *k*-core: just remove all vertices of degree less than *k*, then repeat again and again until no such vertices remain. If any vertices remain, then they form the *k*-core. It is clear that the 1-core contains the 2-core, which contains the 3-core, etc. The smallest nonempty *k*-core (i.e., the one for largest *k*) is called the “main core”. **For the collaboration graph of the second kind, we found (using electronic data) that the main core is the 5-core, and it has 70 vertices (including Erdös, not surprisingly, with degree 30) and 272 edges**. These most social mathematicians all have Erdös number of the first kind at most 2, and 50 of them are Erdös coauthors. Here is the adjacency matrix of this graph .

It turns out that the main core of the collaboration graph of the second kind has four complete graphs on five vertices: ALON-FUREDI-KLEITMAN-WEST-E R D O S, COLBOURN-Hartman-Mendelson-PHELPS-ROSA, COLBOURN-Lindner-Mendelson-PHELPS-ROSA, and Lindner-MULLIN-ROSA-STINSON-Wallis. It also has 125 copies of the complete bipartite graph with three vertices in each part (the other canonical nonplanar graph), such as (FAN CHUNG, RODL, SZEMEREDI)-(RON GRAHAM, TROTTER, E R D O S). So this graph is certainly nonplanar.

Actually, these are not the only complete graphs on five vertices in the collaboration graph of the second kind. For example, Gerald Ludden (Michigan State University) has only four collaborators of the second kind, but each of them has two-author collaborated with each of the others (Koichi Ogiue, Masafumi Okumura, Bang-Yen Chen, and David E. Blair).

### Statistical summaries of Erdos1 and Erdos2 lists (numbers of the first kind)

Here is a statistical summary of the number of Erdös number 1 coauthors for people with Erdös number 2, the number of Erdös number 1 coauthors for people with Erdös number 1, the total number of coauthors for people with Erdös number 1, the number of papers that Erdös’s coauthors have with him, and the number of new coauthors Paul Erdös added each year. These data are based on the 2020 update.

Here is a textfile giving the adjacency lists for the induced subgraph of the collaboration graph on all Erdös coauthors. These data are based on the 2007 update.

Here are Erdös number record holders (for example, which person with Erdös number 2 has the most coauthors with Erdös number 1?). These data are based on the 2015 update.

MORE INFORMATION: Here is a paper summarizing some of what is on this page. It appears in the Proceedings of 33rd Southeastern Conference on Combinatorics (Congressus Numerantium, Vol. 158, 2002, pp. 201-212). An abbreviated version appears in SIAM News 35:9. Another article, which also looks at the publication patterns as a function of area of mathematics, appears in the January 2005 issue of the Notices of the American Mathematical Society. There is a recent paper on collaboration based on the MathSciNet database in the May 2022 issue of the Notices. Finally, there is a file of slides from a talk about the collaboration graph of papers rather than the collaboration graph of people.

This page was last updated on April 18, 2022.