Measuring Differences in Language

Differences in language amongst countries is measured using three scales:

L1 is a 5 point scale which quantifies the difference between the dominant languages of any two countries, i and j,

L2 is a 5 point scale based on the incidence of country i's dominant language(s) in country j, and

L3 is a 5 point scale based on the incidence of country j's dominant language(s) in country i.

The scores for each of these indicators and the resultant factor (see below concerning the confirmatory factor analysis) can be found in an attached Excel spreadsheet at the bottom of this page. This spreadsheet contains the values for 14,280 country pairs (i.e. n x n-1 for 120 countries). The precise coding for these variables is explained below.

Source:

The primary source for these estimates was:

Gordon, R. G. (ed), Ethnologue: Languages of the World, 2005

Time Period:

The precise dates for the language population data vary depending on the country and the language in question; however, the vast majority of the estimates fall between 1990 and 2000. While this represents a fairly dated, and rather broad time frame, population trends in spoken language are very slow to change (the changes are primarily generational), and our scales are intentionally coarse grained; thus our indicators of differences in language should not be adversely affected.

L1 - Distance Between Major Languages:

The distance between the two closest major languages for each pair of countries is based on the preceding classification system and is coded as follows:

5 - Different families

4 - Same family but different branches

3 - Same branch but different at the 1st sub-branch level

2 - Same sub-branch at the 1st level but different at the 2nd level

1 - Same language

L2 & L3 - Incidence of One Country’s Major Language(s) in Other Countries

The second and third language indicators measure the proportion of the population in one country that are able to speak the major language(s) of another country. L2 concerns the incidence of the country i’s major language(s) in country j, and L3 concerns the incidence of the country j’s major language(s) in country i. The indicators are coded as follows:

5 - Less than 1%

4 - Greater than or equal to 1% but less than 5%

3 - Greater than or equal to 5% but less than 50%

2 - Greater than or equal to 50% but less than 90%

1 - Greater than or equal to 90%

Where a country has more than one major language, a weighted average is calculated.

Defining Major Languages:

A major language for a given country is defined as any language which can be spoken by more than 20% of the population, or a language which holds a special official status within the country (e.g. an official second language such as English in India and several African nations). If no single 'first' language exceeds 20% of the population (as was the case in more than a dozen countries, including India and Kenya) a threshold of 10% is employed. In a few very select countries, most notably PNG, no single first language exceeds even 10% of the population. In those cases, the most populous first language is deemed as major.

For the countries used in our analyses, 113 languages qualified as a major language for at least one of the 120 countries. These languages have been grouped into a hierarchy of families, branches, 1st level sub-branches and 2nd level sub-branches based on Gordon's (2005) more substantial classification of 6,912 languages. Two Excel files documenting the hierarchy of language families and the major languages for each country are attached at the bottom of this page.

Lang f - Differences in Language Factor:

The preceding three indicators have be reduced to a single factor using confirmatory factor analysis (cfa). This factor score has been estimated using all 14,280 country pairs. The individual factor loadings and the Cronbach alpha are reported below.