I work broadly in the intersection of Natural Language Processing and Machine Learning.
The research questions I have worked on to date span:
Dataset Creation in Low-Resource Scenarios: What are the challenges in creating labeled datasets in low-resource languages and specialized domains? Can LLMs be leveraged to aid annotation in such scenarios?
Constrained Generation: Can we generate text conditioned on target constraints? Can we leverage explicit auxiliary rules to improve generation? How can we effectively annotate structures?
Cross-lingual Transfer Learning: How can a low-resource language benefit from its high-resource counterparts? Can we discover systemic similarities and dissimilarities to transfer effectively?
Crowdsourcing for Natural Language Tasks: What are the guidelines for a successful annotation project? How to effectively handle annotation artifacts? What are the best practices to ensure performance beyond a dataset?
I maintain a Wiki page for Gujarati --- a regional language of western India. The wiki aims to have a curated list of resources to facilitate research on this popular yet low-resource language. Any contributions are welcome.
Peer-Reviewed Publications
Promptly Predicting Structures: The Return of Inference
Mehta, M., Pyatkin, V. and Srikumar, V.
In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (To Appear) (NAACL'24)
[Paper][Code][Data]
A Universal Dependencies Treebank for Gujarati
Jobanputra, M.*, Mehta, M. and Çöltekin, Ç.
In Proceedings of Joint Workshop on Multiword Expressions and Universal Dependencies (To Appear) (MWE-UD'24)
[Paper][Data]
Verifying Annotation Agreement without Multiple Experts: A Case Study with Gujarati SNACS
Mehta, M., Srikumar V.`
In Findings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL'23 Findings)
[Paper][Data][Code]
Psychotherapy is Not One Thing: Simultaneous Modeling of Different Therapeutic Approaches
Mehta, M., Caperton, D. D., Axford, K., Weitzman L., Atkins D., Srikumar V. , Imel Z. E.
In Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology: Mental Health in the Face of Change (CLPsych'22)
[Paper]
Learning Constraints for Structured Prediction Using Rectifier Networks.
Pan, X., Mehta, M., Srikumar V.
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20)
[Paper][Code]
InfoTabS: Inference on Tables as Semi-structured Data.
Gupta, V., Mehta, M., Nokhiz, P., Srikumar, V.
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20)
[Paper][Code][Dataset][Project Site]
A Logic-Driven Framework for Consistency of Neural Models.
Li, T., Gupta, V., Mehta, M., Srikumar, V.
In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP'19)
[Paper] [Code]
A Service-Oriented Architecture for Human Capital Management System.
Mehta, M. , Derasari, R., Patel, S., Kakadiya, A., Gandhi, R., Chaudhary, S., and Goswami, R.
In 2019 Annual IEEE Systems Conference (SysCon) Proceedings (SysCon'19)
[Paper]
Permanently ArXived
Correlated Data Generation Using GAN and its Application for Skill Recommendation.
Patel, S., Kakadiya, A., Mehta, M., Derasari, R., Patel, R., and Gandhi, R.
Appeared In 2nd Workshop on Data Science for Human Capital Management collocated with ECML-PKDD (2018)
[Paper]
Extended Abstracts
Classifying Impaired Awareness of Hypoglycemia with Convolutional Neural Networks.
Mehta, M., Groat, D., Lin. Y., Gouripeddi, R., and Facelli, J.
In 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).