Lexical diversity: describing the lexicons of young sign systems
Homesign systems are manual-gestural systems developed by deaf individuals who do not have access to spoken language or to linguistic input in the form of an established sign language. These systems provide a window into the earliest stages of language emergence and a central question about this process of emergence concerns the similarity and convergence of different homesign systems created within the same community. However, quantifying similarity across a constellation of homesign systems is a particular challenge in the study of young sign systems. Studies documenting young sign languages, rural sign languages, and homesign systems have emphasized that there is considerable variation between signers, even in languages with a longer history of use (de Vos & Nyst, 2018; Sandler et al., 2011). These studies often compare sign forms directly, for example, Sandler et al., (2011) compare sign forms from signers of Al-Sayyid Bedouin Sign Language (ABSL) and note several issues including significant variation in form between signers, even those from the same family, and the fact that signers often produce multiple signs to describe a single photo.
In this study, I describe an alternative strategy for evaluating similarity between lexicons that does not involve comparing sign forms directly, but rather considers the distribution of signs within a given signer’s lexicon. I present a measure that I call lexical diversity, to characterize the frequency of sign forms within homesigner lexicons. This measure draws on one of the most durable findings from statistical linguistics – the frequency distribution of words. Across diverse corpora, words seem to follow a fairly simple distribution, termed Zipf’s law (Zipf, 1936, Mandelbrot, 1953). This distribution is characterized by a small set of high frequency words, words that comprise the majority of all words produced and a large set of low frequency words, produced rarely (Piantadosi, 2014).
The lexical diversity measure draws on two aspects of a Zipfian distribution, (1) the size of the set of signs that are least frequent – those signs that are produced only once – and (2) the sign that is produced most often. These two indices correspond to concepts from Kirby et al. (2015), expressivity and compressibility. Within the lexical diversity measure, signs that are produced only once are maximally expressive, they have only one referent. However, signs that are repeated across multiple referents may be providing some degree of compressibility in the lexicon, and reflect emergent structure in the system in the form of classifiers or compounds, reflecting categories in the lexicon.
I apply the lexical diversity measure to a dataset of signs elicited from ten child homesigners and their communication partners, including deaf adult relatives and peers. All participants live in Nebaj, a town in the central highlands of Guatemala. There is no standard sign language in use in Nebaj, so deaf people in the community develop homesign systems. Many of the deaf people in this sample are in contact with each other. There are several families with multiple generations of deafness so some deaf children have a deaf adult relative as a communicative model, and there is a local school for special education, where deaf students attend class together. Each participant described a set of 62 photos using their homesign system. The set of signs that each homesigner produced is treated as a mini-lexicon, and signs were glossed based on their iconic form properties, with a “conceptual component” (similar to Richie et al., 2014). Some conceptual components were repeated for multiple photos, such as a sign that iconically resembled driving a car, glossed DRIVE, that many signers produced for several different vehicles in the set of photos, while some conceptual conceptswere produced for only one photo (see Fig. 1).
The lexical diversity index consists of the proportion of signs that were produced only once and the proportion of the most-repeated sign for each signer’s lexicon. We plot these and find that the distributional patterns of the lexical diversity measure correspond to shared socio-communicative experiences (see Fig. 2). Homesigners who interact with other homesigners at school - their peers - have a larger proportion of signs used for only one photo. Homesigners with deaf family members, who interact with deaf adult homesigners, have a balanced proportion of signs used for only one photo and repeated signs. These systems share properties with what Kirby et al. (2015) describe as structured languages, languages that have some compressibility (in the form of repeated signs), while maximizing expressivity (signs for only one photo).