PaVeDa Project

Link to the PaVeDa Database

Project description

PaVeDa–Pavia Verbs Database is an open-source relational database for investigating verb argument structure across languages (Zanchi et al. 2022), which intends to expand and enhance the ValPaL database (valpal.info/ Hartmann et al. 2013; Haspelmath/Hartmann 2015) with more languages and further features. With respect to the ValPaL database, PaVeDa innovates in several major respects. 

New languages:  we aim to expand the number of languages by adding data from language families currently under- or not represented in ValPaL, notably Afro-Asiatic, Uralic and Turkic. The first languages to be added are Modern Hebrew and Berber (Afro-Asiatic), Finnish and Hungarian (Uralic) and Turkish (Turkic). This remains in line with the original aims of the ValPaL, that is, provide typologically diverse data. In addition, and as the most notable new feature, PaVeDa includes data from ancient languages, thus enabling diachronic research, which has hardly been pursued in the past, especially in a comparative fashion. For the time being, the diachronic part of our research concentrates on Indo-European languages, starting with those whose modern stages are already included in the ValPaL such as Italian as representative of Romance languages, English, Armenian and Icelandic, plus other ancient languages that do not have direct modern descendants, such as Gothic, or whose modern stage are not yet included in the database (Greek, Irish), but that we plan to add in the near future (Giuliani 2021, Roma 2021, Olgiati 2021, Zanchi/Tarsi 2021, Giarda 2022, Zanchi/Inglese 2022, Giuliani/Zanchi forthc.). 

Corpus data: Research of ancient languages has forced us to abandon native speakers’ intuition to select the basic verbs lexicalizing each verb meaning and to lay out new selectional criteria based on frequency in corpora, morphological complexity, and continuity of attestation. As an innovation with respect to the elicitation method by which the ValPaL database has been created, we plan to add corpus data to modern languages as well, both those that are being added and those that are in the database already. Adding corpus data will allow avoiding idiosyncratic verb selection and overlooking attested alternations, and will make it possible to draw a frequency-based distinction of regular vs marginal alternations and of basic valency patterns. The latter point has made us reconsider our overall view of valency: instead of establishing basic valency patterns by combining the semantics of the verb with its maximal argument structure, PaVeDa also considers frequency to establish basicness of each valency pattern. Consistently with this usage-based turn (Perek 2015), PaVeDa will be linked to external corpora used as sources of usage-based examples of stored patterns. 

New features: PaVeDa introduces a new cross-linguistic layer of annotation containing comparative concepts for valency patterns and alternations, which allows for contrastive data visualization. The tagset for this layer is inspired by Haspelmath’s (2022) terminological overview, and its values are based on a bottom-up generalization over the language-specific valency patterns and alternations already stored in ValPaL and PaVeDa. This comparative layer is cross-referenced to the language-specific valency pattern and alternation layers in PaVeDa relational database. Up to now, we have been able to reproduce the ValPaL database, pre-process new data on ancient languages, and add it in PaVeDa. Over the next few months, we will continue collecting, pre-processing, and adding new data, and will work on linking external corpora, adding the comparative concept layer of annotation, and designing a new user interface allowing for comparative visualization.