Panel Corpora

Corpus-based approaches to Chinese linguistics: towards increased empiricism

Conveners:

Anna Morbiato, Ca’ Foscari University of Venice and the University of Sydney

Bianca Basciano, Ca’ Foscari University of Venice

In the past decades, corpus-based research has been gaining momentum in contemporary linguistics. While corpora, intended as large collections of naturally occurring texts, have always existed, rapid advances in computation and technology have provided tools for faster and more effective corpus construction and consultation. Chinese makes no exception: corpus data are now considered among the main resource for many linguists, while large-scale surveys are beginning to be taken seriously as an important tool for linguistic investigation (Jing-Schmidt 2013: 2). Among the reasons beyond the increasing number of corpus-based studies is the availability of “a myriad of large and publicly available Chinese corpora” (Xu 2015), which include general purpose corpora, such as the CCL (Centre for Chinese Linguistics, Peking University) corpus or the BCC (Beijing Languages and Cultures University Chinese Corpus), interlanguage corpora, such as the BLCU International Corpus of Learner Chinese, and specialized corpora, such as the ZHTenTen simplified Chinese corpus mounted at Sketch Engine, the LDC (Linguistic Data Consortium at UPenn) or the ELRA (European Language Resources Association). The great advantage of corpora lies in the fact that they offer access to large amounts of authentic, naturally occurring language data produced by a variety of speakers or writers, thus providing more robust, statistically significant foundation for linguistic accounts and analyses. There is now considerable emphasis on the reliability of linguistic data, as many scholars now stress the need for a shift to a more empirical mode of investigation: such an approach “energizes theoretical endeavors in the field, as rigorous theoretical advances are grounded in solid empirical data” (Jing-Schmidt 2013).

While the number of corpus-based Chinese studies is steadily increasing, scholars note that most are oriented toward applied linguistics, with the compilation of frequency character/word lists and interlanguage Chinese studies being the most popular types of research (Xu 2015). Among the latest lexical frequency and word list projects, there are the latest national Chinese character list, i.e., the通用规范汉字表 ‘A General Service List of Chinese Characters’(released in 2013) and Xiao et al. (2009) A frequency dictionary of Mandarin Chinese. Corpus-based researches on second language acquisition and language pedagogy have also been increasing over the last couple of decades, with early projects at BLCU now developed into the BLCU International Corpus of Learner Chinese, and with other studies (Tao 2008, 2009; Xiao 2007; inter alia). On the other hand, scholars agree that corpus‑based sentential/grammatical level research is practically negligible as compared with lexical studies. There have been some innovative corpus studies on morphological aspects of Chinese (Sproat & Shih 1996), grammar (Xiao & McEnery 2008, 2010; Tao 2004), discourse/pragmatics (Jing-Schmidt & Kapatsinsky 2012), and historical linguistics (Ji 2010; Cook 2011). However, according to Xu (2015), apart from these notable exceptions, Chinese corpus-based theoretical linguistics studies are scarce and by no means the mainstream, partly due to the technological and methodological limitations connected with corpus interrogation. McEnery and Xiao (2016) also hold that research in corpus-based descriptive grammar in Chinese is rather sporadic and fragmentary, and has focused on specific linguistic features of interest to individual researchers.

This panel wants to explore this promising, yet relatively underdeveloped area of inquiry: it welcomes proposals that integrate corpus tools with theoretical investigation of Chinese grammar in its various components, including:

· Morphology and semantics

· Event structure and argument alternations

· Syntax and information structure

· Pragmatics and discourse

· Diachronic studies

Studies may be both quantitative and qualitative, as well as synchronic or diachronic. Specifically, the panel aims to address the following questions:

· How can corpora improve current theoretical accounts of Chinese grammar in general?

· What do corpora reveal about the statistical relevance of linguistic phenomena and constructions?

· What are the limitations and the drawbacks of using corpora to investigate Chinese?

References:

Cook, A. (2011). Recent developments in the use of the plural marker men in Modern Standard Chinese in Taiwan. Chinese Language and Discourse 2(1), 80–98.

Ji, M. (2010). A corpus-based study of lexical periodization in historical Chinese. Literary and Linguistic Computing 25 (2), 199–213.

Jing-Schmidt, Z. (2013). Increased Empiricism: Recent Advances in Chinese Linguistics. John Benjamins Publishing Company.

Jing-Schmidt, Z. and V. Kapatsinsky. (2012). The apprehensive: Fear as endophoric evidence and its pragmatics in English, Mandarin, and Russian. Journal of Pragmatics 44, 346–373.

McEnery, T., and R. Xiao. (2016). Corpus-Based Study of Chinese. In The Routledge Encyclopedia of the Chinese Language, edited by S. Chan, 438–51. New York: Routledge.

Sproat, R. and C. Shih. 1996. A Corpus-Based Analysis of Mandarin Nominal Root Compounds. Journal of East Asian Linguistics 5, 49–71.

Tao, H. (Ed.) (2004). Special Issue: Corpora, Language Use, and Grammar. Journal of Chinese Language and Computing 14(2).

Tao, H. (2008). The Role of Corpora in Chinese Language Teaching and Teacher Education. In Issues in Chinese Language Education and Teacher Development, edited by P. Duff & P, 90-102. Lester. University of British Columbia.

Tao, H. (2009). Core Vocabulary in Spoken Mandarin and the Integration of Corpus- Based Findings into Language Pedagogy. In Proceedings of the 21st North American Conference on Chinese Linguistics, edited by Y. Xiao, 13–27. Smithfield, Rhode Island: Bryant University.

Xiao, R. (2007). What can SLA learn from contrastive corpus linguistics? The case of passive constructions in Chinese learner English. Indonesian Journal of English Language Teaching 3(2), 1–19.

Xiao, R. and T. McEnery. (2010). Corpus-based Contrastive Studies of English and Chinese. London/New York: Routledge

Xiao, R. and T. McEnery. (2008). Negation in Chinese: A corpus-based study. Journal of Chinese Linguistics 36(2), 274–330.

Xu, Jiajin. 2015. “Corpus-Based Chinese Studies.” Chinese Language and Discourse. An International and Interdisciplinary Journal 6 (2): 218–44.