Project description

In recent years, the availability of large digitalized corpora has made possible to thoroughly investigate the frequency and distribution of lexical items. The project aims at combining quantitative, corpus based research with qualitative research on language typology. In particular, it is planned to provide a thorough description of verbal valency and of valency-changing categories in Old Indo-Aryan (OIA). Early OIA verbs will be stored in an electronic database, where attested verbal formations (token occurrences) will be accompanied by information on transitivity and other syntactic properties along with the grammatical information on verbal roots (types). Information about token and type frequency of verbs and verb forms will be incorporated into the database. This will be the first such database available for Early OIA. A further step of the research will consist in a diachronic investigation of developments from Old to Middle and New Indo-Aryan (MIA and NIA). The Early OIA database will be integrated with diachronic information for a representative selection of verbs, which will cover the time span that goes from the earliest stage of the language up to modern varieties. This selection of verbs will constitute a template for a future comprehensive diachronic database of Indo-Aryan verbs, and will indicate new directions in order to bridge the gap between synchronic and diachronic typological research on valency-changing categories. For OIA, available digitalized texts will be used, in particular the digitalized version of the Rigveda at the Titus project. The database will serve as a basis for the further steps of the research.

Typological study of the valency-changing categories shows an imbalance between the synchronic and diachronic study of these categories. On the one hand, we know a lot about the morphological, syntactic and semantic synchronic properties of causatives, passives and related categories. Valency as a theoretical concept is firmly rooted and treated in detail in several elaborated frameworks and syntactic theories. Furthermore, we have at our disposal a detailed cross-linguistic typology of the valency-changing categories relying on evidence from a bulk of languages of different structural types. On the other hand, a systematic treatment of these categories in a diachronic perspective is lacking. Nor there is a detailed diachronic typology of possible changes and developments in the system of valency-changing categories available (although some relevant data are scattered in studies on grammaticalization and in historical grammars). This imbalance is partly due to some objective reasons: there are relatively few languages for which we dispose of textual evidence for a period sufficient to observe essential changes in the morphological system and syntactic types, which unavoidably limits the typological diversity of the data. For this reason, it is most appropriate to collect evidence from languages well-documented in texts for a sufficiently long time span. For such a task, the Indo-Aryan group of the Indo-European languages is one of the best candidates:

(i) Indo-Aryan is documented for an uninterrupted period of more than 3.000 years, starting with Old Indo-Aryan (OIA) (1200–600 BC), which can be roughly identified with Vedic Sanskrit. This makes possible a prospective diachronic analysis of the valency-changing categories: all categories attested in Vedic can be traced further up to their reflexes in Middle Indo-Aryan (MIA) (600 BC–1000 AD) and New Indo-Aryan (NIA) (1000 AD–).

(ii) The rich evidence collected by the Indo-European comparative linguistics creates a good basis for hypotheses about the origin of the categories attested in OIA and thus for a retrospective diachronic typology (largely based on the reconstruction of the Proto-Indo-European language): what are sources of these categories and which processes have given rise to them.

The research aims at a systematic description and analysis of the main valency-changing categories (causative, passive, intransitive), along with case-marking patterns, in OIA, complemented by a partial diachronic analysis, based on a limited selection of verbs. The diachronic analysis will concentrate on a representative selection of about 50 verbs, which exemplify the main syntactic types and valency alternations, and describe their evolution over time.

The description of valency-changing categories resulting from the data base will rely on a solid basis created by earlier studies on OIA verb:

• a search within the OIA verbal corpus is possible using concordances (the complete Vedic word-concordance by Vishva Bandhu,2 as well as separate concordances for the Rigveda by Lubotsky 1997) and some other Vedic texts and a constantly increasing number of digital corpora of Vedic texts (besides the already cited texts at the Titus project, see the collections at The Sanskrit Library http://sanskritlibrary.org/textsList.html);

• a bulk of information on the syntactic features (transitivity, case-marking patterns, expression of passive, causative and other valency-changing categories) is scattered (but mostly non-systematized along syntactic features) in a number of studies on OIA morphological verbal categories: thematic root presents, perfects, passives and -ya-presents,6 medio-passive -i-aorists and statives, transitivity and causative oppositions, intensive, desiderative, middle diathesis;

• several studies (mostly of quite preliminary character) concentrated on OIA syntactic verbal categories: causatives and causative/anticausative (inchoative) alternations, passives, labile verbal forms (i.e. forms which can function both as transitive and intransitive), reflexive and reciprocal constructions.

• the very first step towards the systematization of the syntactic information for the most ancient Vedic text, Rigveda (in particular, on case-marking patterns of individual verbs) is H. Grassmann‟s semantico-syntactic dictionary, which is out-of-date, however, and the new dictionary by T. Krisch.

The project builds on the principal investigator's previous research on Vedic passives and intransitive presents, and on preliminary study of a number of voice and valency-changing categories in Vedic (Kulikov 2014a, 2014b). The addition of a quantitative dimension which will rely on extraction of verbs from the digitalized corpus and provide data regarding the frequency and the distribution of verbs and verb forms will provide a further innovative tool for the understanding of the meaning and function of specific valency-changing categories and valency-changing categories in Early OIA. All semantic and syntactic properties of verb roots and verb forms will be stored in the database, and will provide a template for annotation of Indo-Aryan verbs. This will be done in collaboration with other ongoing project that pursue the creation of annotated electronic resources for Sanskrit, in the first place The Sanskrit Library project.

The electronic synchronic database of Early OIA verbs will be supplemented with diachronic data for a representative selection of 50 OIA verbs occurring in younger texts, and, where appropriate, with information about their reflexes in the following period, that is in MIA and NIA. In this way, the data base will provide partial information for the whole 3000-year history Indo-Aryan languages illustrating the most salient tendencies in the development of the Indo-Aryan verb. The information resulting from the diachronic study will be integrated in a separate part of the data base.

The diachronic survey will aim at the following results:

(i) a description of the diachronic development and balance of the different markers of transitivity (passive, causative etc.);

(ii) a systematic analysis of the -áya-causatives in middle and late Vedic and the development of their reflexes in MIA; analysis of other causative oppositions in middle Vedic and their decay by the end of OIA;

(iii) a description of the valency-changing function of the middle diathesis and the loss of relevance of this category;

(iv) a systematic analysis of the labile syntactic type in early Vedic; its decay and disappearance in later periods.

A further outcome will be a systematic approval of several theoretical claims and statements on valency (contained in such elaborated frameworks for description of valency-changing categories as Construction Grammar and Optimality Theory) against diachronic evidence.

Research methodology and approach

The research will be based on both a synchronic analysis of OIA, and a diachronic analysis of historical evidence within the documented history of OIA (1200–600 BC) as well as later stages of Indo-Aryan languages. The Early OIA verbs to be stored in the database will be extracted using existing online corpora, as already mentioned, supplemented by concordances (such as the complete Vedic word-concordance by Vishva Bandhu or A Rigvedic word-concordance by Lubotsky 1997). These corpora and concordances allow for a quick search for each verbal form attested in the Rigveda. More specifically, the systematic way to do this will be to mark up digital editions of Vedic texts with identification of verb forms in XML and then collect the relevant XML records programmatically. Much of the initial markup can be done programmatically using verbal form generation software and search software. Note however that irregular Vedic forms might be missed and there might be overextension where nominal forms are homophonous with verbal forms. The over- and under-extension would have to be fixed by hand with reference to the mentioned Vedic concordances. Finally, the verb forms extracted will be stored in the database with the relevant syntactic context and English translation, accompanied with the grammatical information (tense, act./mid. diathesis etc.) and syntactic characterization (transitive/intransitive, qualification on terms of voices and valency-changing categories: passive, causative, reflexive etc.), and by the information concerning their frequency. As already noted above, it is impossible to exhaustively extract the material for the data base from the enormous Vedic text corpus during a two-year research. Therefore, alongside the Rigveda (the most important earliest of Vedic texts), which will provide the data for the database, I will extract forms of a representative selection of 50 verbs from the Atharvaveda and from a limited number of Middle and Late Vedic texts (Taittirīya-Samhita, Aitareya-Brahmana, Śatapatha-Brahmana and Chandogya-Upanishad), which will ensure a diachronic perspective in the analysis of verbs and their syntactic features. More diachronic information on the development in MIA and NIA will be added in the last part of the research.

The methodological bases of this analysis will thus include several components: (i) corpus-based search and extraction of forms within relevant syntactic context; (ii) classical philological analysis of the corresponding text passages, which is necessary for a correct translation and interpretation of the relevant text passages; and (iii) a typologically oriented linguistic analysis of the forms extracted to store both grammatical functions and syntactic types (transitivity oppositions) in the framework of the usage-based construction grammar approach. This process can be speeded up in many cases by using grammatical and syntactic analyses provided for many relevant forms and passages in existing studies on Vedic verb (which also include, among others, many studies by the applicant). Additional information about the syntax of the reflexes of the relevant OIA forms in MIA and NIA can be extracted, where possible, from existing grammatical studies on the corresponding languages, such as, grammar of Pali by T. Oberlies21 or numerous studies on Hindi and other NIA causatives etc., as well as from available digital corpora and electronic dictionaries of the corresponding languages (on-line Pāli text resources and dictionaries, NIA text corpora, such as The EMILLE Corpus of NIA languages compiled by a research group of Lancaster University, which include Assamese, Bengali, Gujarati, Hindi, Punjabi, Sinhala, and some other NIA languages; Indian Languages Corpora Initiative (ILCI) at J.Nehru University; etc.).