Original transcriptions in notebook form
Alan Rumsey, field assistant John Onga (with notebook), and Wapi Onga, ca. 2004
The Ku Waru Child Language Socialization study features five children acquiring Ku Waru as their first language, recorded between 2013 and 2016. These children are also acquiring Tok Pisin as an additional language. The corpus is focused on longitudinal recordings of one hour per month. The longest run for a single child is from the ages of 2 years and 2 months to 4 years and 9 months.
Additionally, the corpus contains recordings of children's interaction recorded using GoPro; and older recordings facilitated by Rumsey and Merlan from 2004 to 2006. Some of these earlier recordings feature children's speech in Ku Waru from as early as 1 year and 8 months.
The corpus was recorded by field assistants John Onga and Andrew Noma, on audio cassettes during 2004-2006 and on digital audio and video recorders during 2013-2016. Onga (right-hand image above; middle) and Noma (see 'Funding' page) transcribed the recordings by hand into notebooks (left-hand image above). In doing so, they translated the Ku Waru into their own idiolectal varieties of English. In the case of children's speech, they transcribed the children's utterances exactly as produced, and added a translation into adult Ku Waru speech if possible. The notebooks were scanned and typed by Appen Language Services. The typed transcripts have been cleaned by the research team. The size of the corpus is as follows:
Number of individual sessions: 364
Raw number of lines across all files: 2,729,620
Number of lines post processing (removing empty lines and comments): 1,423,718
Number of Ku Waru/Tok Pisin lines (excluding English translations): 785,658
Number of all individual Ku Waru/Tok Pisin words: 2,569,063
At present, work continues on forced alignment the video and audio of the corpus at the utterance level, and on development of a web-based interface (code-named "Ku Waru Shiny") which will allow easy interrogation of the corpus.
The size and diversity of the data set provides a unique opportunity for in-depth investigation of linguistic and anthropological issues. For example, Rumsey, Reed & Merlan (2020) coded 32,760 lines in order to investigate the acquisition of certain complex syntactic constructions by Ku Waru children.
Likewise, based on a close analysis of 17 hours of Ku Waru parent-child interaction, and comparison with a parallel corpus from the CHILDES archive involving children in the US, Rumsey (2013) was able to provide both qualitative and quantitative evidence to substantiate an otherwise anecdotal observation that has often been made in the ethnography of the region, that the possibility of deception is a ubiquitous theme in everyday life there. References to it by the Ku Waru children’s adult interlocutors in the Ku Waru samples were 13 times as frequent as by those in the US samples and such references by the children were 62 times as frequent. Moreover, such references by the children begin to show up at a much younger age in the Ku Waru samples than in the American: 2½ years vs 3½.
The availability of both audio and video for all of the 2013-2016 recordings allows the interactions to be studied not only for their linguistic content, but also from the viewpoint of gaze direction, gesture and other dimensions of bodily comportment which figure centrally in the kind of interpersonal engagement that go on them, as demonstrated for example in Rumsey (2019).
Corpora and archives the project has contributed to include:
ACQDIV (Language, ACQuisition, DIVersity) - KWCLSS corpus finalised, awaiting accession
PARADISEC (Pacific and Regional Archive for Digital Sources in Endangered Cultures)
Rumsey, Alan. (2013). Intersubjectivity, deception and the ‘opacity of other minds’: Perspectives from Highland New Guinea and beyond. Language and Communication 33(3):326-343. https://doi.org/10.1016/j.langcom.2013.06.003
Rumsey, Alan. (2019). Intersubjectivity and engagement in Ku Waru. Open Linguistics 5(1):49-68. https://doi.org/10.1515/opli-2019-0003
Rumsey, Alan; Reed, Lauren W.; & Merlan, Francesca. (2020). Ku Waru clause chaining and the acquisition of complex syntax. Frontiers in Communication 5, 19. https://doi.org/10.3389/fcomm.2020.00019