Position paper

Position paper: Austroasiatic syntax in diachronic and areal perspective [draft]

Mathias Jenny, Paul Sidwell, Mark Alves

July 2016

1. Background

The Austroasiatic (AA) language family has received considerable attention by linguists since the late 19th century. The main methods employed in respect of language classification have always been heavily lexical, and to a lesser extent typological, but a specific on syntactic features has been missing. The unity of the family was contested for decades, resulting at times in the separation of the Munda group and Vietnamese from the rest of the family, and the Nicobarese varieties also posed special problems in terms of figuring out how they fit into the family. In 1904, Schmidt, relying on numerals, placed Munda with the Indo-China AA languages. Serious divisions of opinion arose in the 1930s, and it was in 1942 that Sebeok outlined a general case against a connection between Munda and other AA, asserting that the lexical parallels are unconvincing, and ought to be subordinate to consideration of structural features.

Until the 1960s, while the arguments remained largely lexical, scholars had begun increasingly to point out the structural differences between the rather consistent verb-final word order of the Munda languages of India and the rest of the family spoken in Southeast Asia (termed “Mon-Khmer” by many authors) with flexible verb-medial clause structures, which also allow for pragmatic variation. So, while in 1959 Pinnow firmly reestablished the credibility of Munda within AA, again relying on lexical (and phonological) evidence, he also soon went on (1963) to raise the issue of “structural classification” and he characterised the problem as follows:

“On the basis of their syntactical framework the Austroasiatic languages may be classified into two groups: (a) the largely co-ordinating and analytic Khmer-Nicobar languages, and (b) the largely subordinating and synthetic Munda languages.”

(Pinnow 1963:145)

This framework was further elaborated upon by Donegan and Stampe at the 1978 SICAL meeting and in various subsequent publications in 1983 and 1993, drawing in a new dimension in terms of the phonology/syntax interface. Yet, we can say that, so far, when AA syntax has been considered, the facts are typically very broadly characterised and have not dealt with specifics that would allow for confident reconstructions of syntactic structures.

Much progress has been made in the late 20th and early 21st century in refining (and rearranging) the classification and development of the AA languages (see Sidwell 2009 for overview), and we now have a much clearer idea of the overall shape of the AA language family. It has become evident that many of the differences between Munda and the rest of AA are due to areal influences, although it is extremely problematic to distinguish between contact driven and accidental convergence within the MSEAsian Lingusitic Area (Sidwell 2015a). Thus, Munda has converged structurally in many respects with dominant neighboring Indo-Aryan and Dravidian languages, but not without leaving traces of older structures in several points. On the other hand, no shared innovations suggest that “Mon-Khmer” is actually a subgroup at the same level as Munda, but the evidence rather points towards a flat tree structure, with Munda splitting off at the same level as the other branches (Sidwell & Blench 2011).

While the classification and reconstruction of AA has has been seriously pursued by scholars and delivered real progress in understanding, similar energy has not been invested in typological, especially syntactic, studies in AA languages. For a long time the linguistic material available for comparative studies was restricted to wordlists of varying quality and (phonetic/orthographic) accuracy, or hidden in texts without much in terms of annotation and analysis (one laudable exception is Costello and Sulavan’s (1993) collection of Katu folktales, complete with transcription and word-by-word glosses). Fortunately, the tide has begun to turn and the 21st century has seen the publication of a number of excellent full grammars and grammar sketches of AA languages written in modern linguistic terminology (e.g. Burenhult 2005, Kruspe 2004, Peterson 2011, Haiman 2011, Alves 2006, Jenny 2005, among others), as well as the two volume Handbook of Austroasiatic Languages (Jenny & Sidwell 2015). With the availability of good descriptive material, it has become possible to look at the AA languages also from a syntactic typological perspective (e.g. a typological overview of AA by Jenny, Weber, and Weymouth 2015 and AA typological morphology by Alves 2014 and 2015). The family has potentially much to offer in different areas, from insights into the working of language contact to internal syntactic change and the areality and spread of features. The superficial typological differences among AA languages often reflect the history of migrations and contact with unrelated peoples and languages, which have left traces in the profile of the documented varieties. Behind these traces of contact induced features lies the core of proto-AA syntax and morphology. This original core can be worked out in a similar fashion to which we strive to separate indigenous vocabulary from loans which entered all languages at different times and from different sources. Teasing apart inherited and contact induced traits has to rely upon knowledge of:

a. what is common to all or most languages within the family, and

b. corresponding structures in potential contact languages, both past and present.

Besides linguistic data, the syntactic reconstruction, like lexical comparison, must be able to be reconciled with insights from history, geography, and population genetics.

2. Questions and methodology

The basic questions to be answered seem simple enough: What is where, and why? What categories, structures and patterns are found in which languages of the family and in which areas within specific languages (clauses, phrases, subordinate clauses, etc.). How can we explain the occurrence of these structures exactly where we find them? Are there genealogical and/or areal patterns in the distribution? Can the structures be explained by biases in language processing and language use?

Simple as the questions ‘what, where, why?’ appear, there are a number of methodological and practical obstacles in answering them. How can we get around the lack of certain historical data in most cases when it comes to AA languages, their speakers and migrations? How do we get past the lexical and phonological reconstructions of sub-branches and look at syntactic features? While sound changes are typically relied upon to be regular and systematic, it is generally regarded to be problematic to expect syntactic changes to follow regular or predictable paths (Harris & Campbell 1995). The general challenge is to ask the right questions in the right context and find a methodology or methodologies to answer them.

The following points should be considered when looking at the development of AA syntax:

  • Generally: What are the best methods to reconstruct AA syntax? How much can we say about the syntax of Proto-AA or of later periods? What are the limits of what can be explained?

  • What features are found across the family? Special importance is to be given to features that are found in widely dispersed languages and are not explainable by language contact (areal convergence) nor typologically very common. This leads to the question “What were the syntactic traits of Proto-AA and of later stages of AA in various sub-branches?”

  • (To what extent) are typological exceptions or unexpected structural features evidence of events in the history of the language?

  • When changes occurred, under what circumstances, and due to what factors did they occur?

  • How can we distinguish shared original structures from shared changes due to spread of features (e.g. the general SV/AVP structure in MSEA)? How can we distinguish external factors (language contact) from internal factors (change by language use, pragmatic variation, etc.)?

  • How much can the timing of stages be obtained regarding historical syntactic developments in AA?

Some questions and challenges in the undertaking of the syntactic reconstruction of proto-AA:

  • There are many gaps in available data, and the available sources of descriptive materials vary greatly in quality and comparability. How can this situation be overcome? How can we weight the reliability of inferences and analyses based on these data?

  • Which features are retentions? Which are changes? Of changes, are they internally or externally motivated? It may be difficult to find reliable criteria by which to answer these questions in many or most cases, but this shouldn’t preclude the attempt to find answers.

  • How can we reconstruct syntactic structures when there is uncertain status of some lexical categories and subcategories? Alternatively, if we take the view that word classes are emergent properties of syntactic structures, rather than the latter being constrained by them, do we gain more useful traction in modeling historical syntax?

  • How can we distinguish SV/AVP from topic-comment constructions? Or more generally, how can we distinguish syntactic from pragmatic features in AA grammar?

Time Depth to Consider within the Phylum

Other questions to consider involve historical timing. Can we assign time depths to the different genealogical levels of the AA languages, as in the sample below? How can we correlate the linguistic periods with historical periods? Do we actually need this information for a successful reconstruction of AA syntax?

  • Proto-AA
  • Main branch (e.g., Vietic, before and into early contact with Chinese)
  • Sub-branch (e.g., Viet-Muong, possibly post-Chinese independence)
  • Modern (e.g., Muong varieties)

Questions of Geography

The data should answer questions about the timing but also historical geography of the different AA groups and the proto-language.

  • What can we say about the approximate center of dispersal of AA?
  • Is (the knowledge of) a geographic center important or not? Should it not be possible to reconstruct without explicit reference to dispersal area?
  • What does the morphosyntactic data suggest regarding the separation of earlier AA speech communities and issues such as community size, language shift, bottlenecking of populations, and so on?

Questions of society

In many cases peripheral societies and peoples (‘residual zones’) tend to be more conservative than centralized societies (‘spread zones’) (Nichols 1996). Language change is often triggered by L2 speakers, which suggests that peripheral languages with no or few L2 speakers retain archaic structures better than central languages Næss & Jenny 2011).

  • How can we use information about the social structure of a people to assess the status of their language?
  • Are peripheral languages really more conservative than central languages?

3. Data sources

The sources of data include modern, ancient, and comparative data. An important point to keep in mind is that in many cases grammatical descriptions are not sufficient as reliable sources of data. Often one has to resort to texts to find certain structures, which may be ‘minor use patterns’ (Heine & Kuteva 2005) in a language, but go back to older stages and may be retentions of the proto-language. Various sources are available, and the number of good quality linguistic descriptions and annotated and glossed texts in AA languages is increasing steadily. The following can be considered the main types of data sources for the comparative study of AA syntax, to which data gathered through fieldwork may be added, both elicitation and recordings of natural speech.

1. Grammatical descriptions of individual languages and features

2. Texts of individual languages (ideally glossed)

3. Historical documents, inscriptions (e.g. Mon, Khmer, Vietnamese)

4. Reconstructions of individual branches and sub-branches

5. Typological comparison within and outside the family

4. Syntactic aspects to consider

Diachronic syntactic studies can include not only core clause and phrase structures but also functional categories, lexical categories, and etymological sources and development of grammatical vocabulary. The following aspects may be worthy of more in-depth investigation in AA languages

4.1 Constituent order in clauses (dependent, independent)

Different word order types are found in the AA languages, at least to some extent coinciding with general areal patterns.

The bulk of AA languages on the SEA mainland between Myanmar and Vietnam are verb-medial, with the main word orders SV and AVP. In all or most of these languages, arguments can be freely dropped if they are known or retrievable from the linguistic or extra-linguistic context. For pragmatic reasons, arguments and peripheral elements can be fronted or placed after the clause as afterthoughts (Antitopics). This leads to a great variation of constituent order possibilities in the eastern AA languages. The variation is in most cases restricted to independent (main) clauses, with dependent clauses generally exhibiting the basic word orders SV and AVP. This can be explained by the fact that subordinate clauses are often presupposed and thus less (or not) accessible to pragmatic processes.

The Munda languages of India are rather consistently verb-final, or more generally head-final, a characteristic which can be considered a case of convergence towards the neighboring Indo-Aryan and Dravidian languages.

The Nicobarese varieties, heavily affected by language contacts despite being geographically separated from the mainland, show generally verb-initial structures in main clauses and verb-medial in dependent clauses.

The picture gets more complex when considering more peripheral constructions and geographically more peripheral languages. In mainland SEA, verb-medial is the dominating word order in all major languages (apart from pragmatically triggered fronted constituents and afterthoughts). Languages without scripts or standardized grammars show more variation, though, and verb-initial clauses are not uncommon, e.g. in Katu (Costello 1993). In Rumai (Palaungic) most types of subordinate clauses are verb-initial, and in Wa verb-initial structures are found mainly in subordinate clauses, but also in main clauses. While Standard Khasi is mostly verb-medial, the less standardized varieties such as Pnar and War are mostly verb-initial. Even in generally verb-final Munda languages, agreement patterns and noun-incorporation show verb-initial structures which can be taken as more archaic than the verb-final clause structures. In Aslian, verb-initial clauses are found in different contexts, and in Nicobarese main clauses are generally verb-initial and subordinate clauses verb-medial. While the verb-final patterns in Munda and verb-medial structures in mainland SEA can be explained as areal features, language contact cannot straightforwardly account for the verb-initial patterns found throughout the family (with the possible exception of Nicobarese and Aslian, where Austronesian influence may have played a role).

In light of this kind of variation, what can we conclude from the distribution of word order types in AA languages? What can be attributed to areal convergence? One possible explanation is that the proto-language at least allowed this word order, possibly with pragmatically triggered variation.

4.2 Information structure

The encoding of information structure is closely linked to the word order variation found in the family, though not restricted to it. Semelai (Aslian) is reported to have a tendency towards “new information before old information” (Kruspe 2004), which also seems to be the case also for Nicobarese. Eastern AA languages show topic-comment structure, which is also true if the topical element is the verb and the predicate a nominal (e.g. ‘left is only rice, the curry is all eaten’). Some languages use explicit markers indicating the information structural status/function of constituents (e.g. Mon, see Jenny 2006, 2009).

Thus, one must ask, what means are found in AA languages to encode information structure? What can we say about the changes and development of information structure encoding? To what extent is the encoding of information structure mirrored in neighboring non-AA languages? Can we draw any conclusions for the proto-language (e.g. basically comment-topic which later changed to topic-comment in some languages?)

4.3 Phrase structure: components of verb phrases and noun phrases

Different elements can be combined to make up noun phrases and verb phrases. While highly isolating Vietnamese does not seem to make a distinction between the word-level and phrase level, the situation is less clear in other languages. Khasi, Nicobarese and Munda make use of noun incorporation or compounding which result in word forms syntactically distinct from phrases. This is not clearly the case in Mon and Palaungic, for example. Verb phrases can consist of a number of verbal and non-verbal elements.

Questions to consider include: Which elements occur in different languages of the family (form and/or function)? Which elements in which languages can be attributed to language contact and areal convergence? Are there any regularities in the ordering of elements? Can we conclusively say anything about elements making up an NP and a VP in proto-AA?

4.4 Subordinate clauses - form and functions

Subordinate clauses are cross-linguistically not a unified category, neither in forms employed nor in functions covered. Adverbial, complement, and attributive clauses can behave very differently, and take different markers within language families and individual languages. A subordinate clause may or may not take an overt subordinator, which may occur in clause initial or clause final position, or within the subordinate clause (like the relativizer in Middle and Modern Mon). Subordinate clauses have been said to be more conservative than independent clauses (Bybee 2001), so they may reflect older stages of the languages. Pragmatic word order variation is often restricted in subordinate clauses.

Questions to consider include: What forms of subordinate clauses are found where in the family? What functions are covered by which forms? Are there overt subordinators? Where are they placed in the clause (and why)?

4.5 Complex verbal predicates (secondary verbs, serial verbs, TAM)

In many AA languages, a verbal predicate can consist of more than one lexical verb. Secondary verbs can appear in different forms, positions, and functions. Especially common are verbs of movement expressing aspectual and directional functions, such as ‘stay’ for ongoing events. More specific functions are often found expressed by verbs such as ‘eat’, ‘throw, discard’, ‘keep’, and many others. These constructions are typical of SEA languages in general, so it is not easy to attribute individual secondary verb constructions to AA. But the presence of common constructions can be telling in terms of past contact scenarios, from which contact/migrations can be deduced.

Questions to consider include: Which V2s are found in AA and surrounding languages? Where can contact influence be detected? Which constructions can be attributed to proto-AA? Are there common sources of the expressions of specific functions?

4.6 Negation

Negation is in most AA languages achieved by pre-verbal negators, but some languages have post-verbal negators. These can be free or bound morphemes, as in Vietnamese and Mon, respectively. Some negators have lexical origins as verb (e.g. Car and Munda ʔət, Khmer ʔɒt), while others have no known etymology as anything but negators. In many or most AA languages, negators can be used only with verbal elements, the negation of non-verbal constituents requiring special constructions, often involving a copula or a dummy verb. The position of the negator in complex verbal predicates is either determined by its scope (usually over the immediately following verb) or by structural restrictions. In some languages, negation can be reinforced by clause final particles, which in turn can be reanalyzed as principal negators, leading to the loss of the original negation marker in some cases. Besides general negators, a number of AA languages (e.g. Rumai, Khasi, Kammu, Vietnamese) have a special ‘perfect/past tense negator’ meaning ‘not yet’.

In a number of AA languages, negative imperative expressions make use of a special prohibitive marker, which may be a free or a bound morpheme.

Questions to consider include: Are there any common forms or shared developments of negators across the family? Is basic negation generally restricted to verbal elements? Is narrow scope of negation over the immediately following vern the norm? Can we postulate bound negators for the proto-language, or is all we see development from free (possibly verbal) negators to clitics or affixes? Is prohibitive a separate category? Is there a pattern in the development of prohibitive markers?

4.7 Nominal classification (classifiers, noun classes)

A number of AA languages make use of classifiers when common nouns are combined with numerals (Adams 1991). The use of classifiers is not (or barely) attested in the early documented languages Old Mon and Old Khmer. Also modern Mon and Khmer do not use classifiers, but the structure of their quantifier phrases is reminiscent of classifier constructions: In Mon, measure words like ‘day’, ‘mile’, ‘kilo’, etc. may follow a numeral, while common nouns precede the numeral. There are thus the different constructions ‘two day’ vs. ‘friend two’, similar to languages with classifiers such as Thai, where the same expressions would be ‘two day’ and ‘friend two CLF’, respectively. Classifiers are usually seen as a typical feature of mainland SEA and EA languages (Chinese, Thai, Vietnamese, etc.), but they are also regularly and obligatorily used in non-core SEA languages, such as Khasian (which also has a grammatical gender system) and Nicobarese (although the set of classifiers in these languages is very small). At the same time, classifiers are all but absent from the core SEA languages Mon and Khmer.

Questions to consider include: Are there common patterns in classifier constructions across the family? Can we reconstruct classifiers for the proto-language? If proto-AA had no classifiers, how can the occurrence of classifiers in languages like Khasi and Nicobarese be explained?

4.8 General linkers (proto-AA *ta and similar)

A grammatical morpheme (called here ‘linker’) covering a range of functions, including marking of oblique objects, attributive/relative expressions, adverbials, and others is found in a number of AA languages, often with a similar form ta or ti (e.g. Old Mon, Old Khmer, Car). Not all functions are present in all languages where the marker occurs, and in some languages the form of the marker is slightly different (e.g. Palaungic), but the range of functions remains similar. Interestingly, similar markers with partly similar functions are found in non-related languages of the area, like Chinese de and Thai tʰîː. The connections, which may be pure coincidence, are not clear, both within AA and with neighboring languages.

Questions to consider include: What are the functions (and forms) we can reconstruct for the ‘linker’ in proto-AA? How are the same functions expressed in the languages that have lost the linker or restricted its functions? Is there outside influence from non-related languages in the development of the linker?

4.9 Grammatical relations, argument coding and alignment

Most AA languages show accusative alignment in most or all constructions types, that is, intransitive subjects are treated the same as transitive subjects (Agents), but differently from objects (Patients). Grammatical relations (GRs) can be marked by position (e.g. S/A preverbal, P postverbal), cross-referencing on the verb (e.g. agreement with S and or A), case marking (affixes, adpositions), or others. Although claims have been made that ergative constructions are widespread in AA (Diffloth n.d.), these are actually not found in many languages of the family. Aslian languages have special marking for Agents (as opposed to intransitive S), but the alignment is rather tripartite, treating S, A, and P differently. Some intransitive subjects in Mon occur after the verb, making them syntactically similar to Patients, but this is restricted to a small class of (topical) verbs (Jenny forthc.).

Questions to consider include: What alignment patterns are found where in the family? Can external influences be seen as responsible for these patterns? What can we reconstruct for proto-AA? What grammatical relations can be found in what construction types? Are there any patterns in GRs across the family? What means are there to overtly mark arguments?

4.10 Questions - polar and content

Many AA languages have different means to form polar and content questions. Intonation may be sued by some languages for the expression of the former, while frequently sentence particles are used, either alone or in combination with intonation. Content questions typically contain an interrogative element, which may occur in situ or in a high-focus position in the clause, often in immediate preverbal or clause initial position. A sentence particle may or may not be present in content questions. If there is a question particle, it is typically different from the polar question marker.

Questions to consider include: What means are common in the expression of questions across the family? Are there common forms of interrogatives that can be traced back to the proto-language? Is there any regularity in the placement of interrogatives and question particles?

4.11 Sentence and clause particles

Sentence particles are used in many languages in SEA and beyond to express a wide range of functions, including speaker’s attitude, illocutionary force, and others, usually in sentence final position. Some of these particles in some languages can be traced to lexical origins, but in many cases they have no obvious lexical source. On the other hand, a number of sentence particles have lookalikes in non-related neighboring languages (e.g. the emphatic particle Mon nah and Thai náʔ). Mutual influence may well play a role here, but the widespread use of sentence and clause particles in AA languages suggests that this was a feature of the proto-language.

Questions to consider include: Are there any common forms and functions of sentence particles across the family? Can we separate areal influences from inherited traits?

4.12 Prosodic structure and syntax

The prosody of an utterance is of eminent importance, but the connection between intonation and syntax in AA languages has not received much attention in the past. Claims have been made that the major change that occurred in Munda languages was from rising to falling intonation patterns (Donegan & Stampe 2004), and that all other changes naturally followed from this. The prosody of at least some Munda languages (e.g. Kharia, see Peterson 2007) shows falling intonation on the clause and phrase level, but rising on the word level. Similar patterns are found in Mon (also in close contact with a consistent verb-final/head-final language), though with much less effect in the syntactic structure (though verb-final patterns are increasingly found, see Jenny 2011).

Questions to consider include: What can we say about the prosody of proto-AA based on patterns found in the modern languages? Is there a connection in posited or observed syntactic changes and changes in prosody? Is the prosody more likely to converge with neighboring languages than other aspects of a language?

4.13 Grammaticalization

Recurring grammaticalization clines are seen in the region, such as those noted in Matisoff 1991. The possibility of reconstructing grammaticalized sources in Proto-AA is unlikely, but within AA sub-branches, such as goal is entirely necessary. Many of these instances of grammaticalization of words and structures (e.g., deriving the comparative from ‘over’) are likely the result of language contact, but language contact induced patterns of grammaticalization of words and structures are still part of the linguistic history in the language family and must be described and as much as possible explained in terms of the paths and their likely sources, whether direct borrowing, more regional contact, or language internal innovations that are typologically common (e.g. those included in Heine and Kuteva 2002). The matter of grammaticalization is significant as it is relevant to almost all of the previously mentioned categories.

Questions to consider include: What, if any, grammaticalized words can be reconstructed to the proto-AA level? How many of such words are the result of regional or sub-branch developments? Are there places in time or geography in which certain forms emerged and spread in AA? Which are the result of contact with AA or other language groups? How can we determine the difference between preservations, innovations, and borrowings?

4.14 Other topics to consider

A number of other morphosyntactic topics promise insight into the structure of the AA proto-language. The following list is by no means exhaustive, and as more material becomes available, more aspects can be studied in more depth and detail.

  • Syntactic categories
  • Pronominal systems
  • Reference tracking (anaphoric and logophoric elements)
  • Numeral systems
  • Agreement (verbal and nominal)
  • Indication of plurality (verbal and nominal)
  • Indication of aspect, tense, or state
  • Voice alternations (morphological and periphrastic)

5. Conclusion

The necessity and opportunity to study the history of AA syntax is long overdue. There is more data on more AA languages than ever before, more typological context to consider, and recent decades of ideas and insights about language contact and change to take into account. It is, however, a tremendous task that will require coordinated effort, multiple stages, we can expect varying degrees of success and failure. There are many more questions than answers at this point, but insights are likely to come and will hopefully contribute to understanding of human history in the greater Southeast Asian region.


