Verbal Multiwords in Indian Languages

Current Status and Future Directions

December 21st, 2019

ICON 2019, IIIT Hyderabad

About the workshop

Venue for the workshop is updated: Himalaya 105, IIIT Hyderabad

Registration for the workshop can be done on ICON's main conference page: see this link.


This workshop is focused on assessing the current status of research on verbal multiword expressions (MWE) in Indian languages and creating a strategy for future directions, particularly in terms of lexical resources. This workshop will be organized as part of ICON 2019 in Hyderabad, India.

Motivation

Verbal multiword expresssions, which include phenomena such as complex predicates, support verb/light verb constructions and causatives among others are prevalent among South Asian languages (Masica, 1993). A range of linguistic research across theoretical frameworks has examined such expressions more closely over the years e.g (Jayaseelan, 1988; Abbi and Gopalakrishnan, 1991; Butt, 1997 ;Mohanan, 1997; Davison, 2005; Ramchand, 2008) (among others).

These expressions have also received some attention in the computational linguistics literature with the development of automatic multiword detection systems for Indian language MWE (Sinha, 2009; Das et al., 2010; Chakrabarti et al., 2008; Begum et al., 2011; Vaidya et al., 2016) and computational grammar representations (Poornima and Koenig, 2008; Ahmed et al., 2012; Vaidya et al., 2019).

Over the years, psycholinguistic research on the processing of these expressions has begun to emerge, using a variety of methodologies from in the psycholinguistic literature (Vaidya and Wittenberg (forthcoming), Dasgupta et al., 2015).

Despite this rich and growing body of work on these expressions, only a few lexical resources have been developed for the multiwords in South Asian languages (Although they are annotated in the Indian language treebanks, (Bharati et al., 2002; Palmer et al., 2009)). Lexical resources that cater specifically to verbal multiwords have been developed for several European languages with the PARSEME project (Savary et al., 2017) and corpora such as STREUSLE (Schneider and Smith, 2015).

Lexical Resources

This workshop will aim at having a discussion around imagining/designing such as resource or shared task for verbal multiwords in South Asian languages, that would be of interest to theoretical linguists, psycholinguists and computational linguists alike. The discussion would be centered around whether existing annotation designs can be adapted, or changed with respect to the particular challenges of South Asian languages, where such expressions are highly productive.

Accordingly, the program will consist of three talks that examine the problem of complex predicates from different points of view: theory, typology, psycholinguistics and computational linguistics. The panel discussion will focus on some of the questions about the design of lexical resources. We hope to use this workshop as a launchpad for bringing together researchers in this area and to use it as a platform to discuss future developments.

References

Abbi, Anvita and Devi Gopalakrishnan. 1991. Semantics of Explicator Compound Verbs in South Asian Languages. Language Sciences 13(2):161– 180.
Ahmed, Tafseer, Miriam Butt, Annette Hautli, and Sebastian Sulger. 2012. A reference dependency bank for analyzing complex predicates. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). http://www.lrec-conf.org/proceedings/ lrec2012/pdf/474_Paper.pdf.
Begum, Rafiya, Karan Jindal, Ashish Jain, Samar Husain, and Dipti Misra Sharma. 2011. Identification of Conjunct Verbs in Hindi and their effect on Parsing Accuracy. In In Proceedings of the 12th CICLing, Tokyo, Japan.
Bharati, Akshar, Rajeev Sangal, Vineet Chaitanya, Amba Kulkarni, Dipti Misra Sharma, and K.V. Ramakrishnamacharyulu. 2002. Anncorra: Building treebanks in indian languages. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization at COLING 2002.
Butt, Miriam. 1997. Complex Predicates in Urdu. In A. Alsina, J. Bresnan, and P. Sells, eds., Complex Predicates. CSLI Publications, Stanford
Briem, Daniela, Britta Balliel, Brigitte Rockstroh, Miriam Butt, Sabine Schulte im Walde, and Ramin Assadollahi. 2009. Distinct processing of function verb categories in the human brain. Brain Research 1249:173– 180.
Chakrabarti, Debasri, Hemang Mandalia, Ritwik Priya, Vaijayanthi Sarma, and Pushpak Bhattacharyya. 2008. Hindi Compound Verbs and their Automatic Extraction. In Proceedings of Coling 2008: Companion volume – Posters and Demonstrations, pages 27–30.
Das, Dipankar, Santanu Pal, Tapabrata Mondal, Tanmoy Chakraborty, and Sivaji Bandopadhyay. 2010. Automatic extraction of complex predicates in bengali. In Proceedings of the Workshop on Multiword Expressions: from Theory to Applications (MWE 2010).
Dasgupta, Tirthankar, Manjira Sinha, and Anupam Basu. 2015. Computational Models of the Representation of Bangla Compound Words in the Mental Lexicon. Journal of Psycholinguistic Research 45(4):833-55. Davison, Alice. 2005. Phrasal predicates: How N combines with V in Hindi/Urdu. In T. Bhattacharya, ed., Yearbook of South Asian Languages and Linguistics, pages 83–116. https://doi.org/10.1515/ 9783110186185.83: Mouton de Gruyter.
Jayaseelan, K. A. 1988. Complex predicates and theta theory. In W. Wilkins, ed., Syntax and Semantics: Vol 21, vol. 21. Academic Press, Inc.
Masica, Colin. 1993. The Indo Aryan Languages. Cambridge University Press.
Mohanan, Tara. 1997. Multidimensionality of representation- NV complex predicates in Hindi. In A. Alsina, J. Bresnan, and P. Sells, eds., Complex Predicates. CSLI Publications, Stanford.
Palmer, Martha, Rajesh Bhatt, Bhuvana Narasimhan, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure. In Proceedings of ICON-2009: 7th International Conference on Natural Language Processing. Hyderabad. Poornima, Shakthi and Jean-Pierre Koenig. 2008. Reverse Complex Predicates in Hindi. In Proceedings of the 24th NWLC, Seattle WA.
Ramchand, Gillian. 2008. Verb Meaning and the Lexicon: A First Phase Syntax. Cambridge University Press.
Savary, Agata, Carlos Ramisch, Silvio Cordeiro, Federico Sangati, Veronika Vincze, Behrang QasemiZadeh, Marie Candito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova, and Antoine Doucet. 2017. The PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 31–47. Valencia, Spain: Association for Computational Linguistics. Schneider, Nathan and Noah A. Smith. 2015. A corpus and model integrating multiword expressions and supersenses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1537–1547. Denver, Colorado: Association for Computational Linguistics.
Sinha, R. Mahesh K. 2009. Mining Complex Predicates in Hindi Using a Parallel Hindi-English Corpus. In Proceedings of Workshop on Multiword Expressions, ACL-IJCNLP 2009, pages 40–46.
Vaidya, Ashwini, Sumeet Agarwal, and Martha Palmer. 2016. Linguistic features for hindi light verb construction identification. In Proceedings of COLING 2016.
Vaidya, Ashwini, Owen Rambow, and Martha Palmer. 2019. Syntactic composition and selectional preferences in Hindi Light Verb constructions. Linguistic Issues in Language Technology 17(1):1–30.