Research Project
Discourse Integrated Dravidian language to Dravidian language machine translation (2022-2025)
Develop and deploy Tamil-Telugu neural MT including discourse analysis and integrate in speech to speech MT
Create annotated data for POS, syntatcic & semantic parsing and discourse markers
Use Machine Learning techniques in building tools such as POS tagger, parser, discourse analyser and integrate them in a neural MT
Software used: Python, Perl, C, Web-based tools
Funded by MEITY, Govt. of India (Rs. 140.19 Lakhs). Collaborators: AUKBC-Chennai, IIIT-Hyd, ICFOSS Trivandrum, MIT-Manipal, C-DAC Noida, DAIICT Gandhinagar
Sanskrit to Indian Language Machine Translation cum Language Accessor (2022-2024)
Develop Sanskrit-Telugu and Sanskrit-Marathi transfer-based machine translation
Provide access to linguistic analysis to the readers to decipher the meaning of source language
Funded by Indian Knowledge System, Department of Higher Education, Govt. of India (Rs. 10 Lakhs)
Syntactic Parser for Tamil: A Data-driven Approach (2023-2024)
Annotate 100K tokens in Tamil for POS, Morph and dependency relations
Build a machine learning model for Tamil parser
Funded by Tamil Virtual Academy, Govt. of Tamilnadu
4. Sanskrit Knowledge Accessor
Develop and deploy Sanskrit-Tamil hybrid (transfer+neural) MT
Build high quality linguistically rich annotated data, needed for Machine Learning techniques.
Integrate MT in dialogue modelling
Funded by MEITY, Govt. of India (Rs. 72.012 Lakhs). Collaborators: UoH-DSS, IIT-KGP, IIT-Kanpur, MAHE, CVV, IGDTUW
5. Speech to Speech Machine Translation (SSMT): Pilot System: (March 2020- July 2021)
Built web-based SSMT API for Tamil-Telugu MT
Compiled a huge parallel corpora and domain dictionaries for building neural MT
6. Indian Language to Indian Language Machine Translation (IL-IL MT) 2010 - 2016
IL-IL MT is a bidirectional Indian language to Indian language machine translation project. This project intends to develop 18 MT systems involving 9 Indian language pairs based on transfer approach. It is being developed in a consortium mode involving International Institute of Information Technology Hyderabad, University of Hyderabad, Institute of Information Technology (IIT) Bombay, IIT Kharagpur, Indian Institute of Science Bangalore, Anna University and Centre for Development Of Advanced Computing Pune. It is funded under Technology Development for Indian Languages (TDIL) Program by the Department Of Electronics and Information Technology (DeitY), Ministry Of Communications and Information Technology(MCIT), Government of India.
Language Technology Laboratory at Centre for Applied Linguistics and Translation Studies(CALTS) has conducted the research and development for building Telugu-Hindi, Hindi-Telugu, Telugu-Tamil and Tamil-Telugu MTs. As principal investigator, Prof. G. Uma Maheshwar Rao coordinates this project and as an investigator I (Dr. K. Parameswari) carried out research and development in building Telugu-Tamil and Tamil-Telugu MT systems with the research team consisting linguists and computer scientists.
My contribution to this project is mainly in building Transfer Grammars, Tamil Morphological Analyser & Generator, POS tagger, Chunker, Dependency parser, Multi Word Expression & Agreement Generation modules for Tamil and Telugu.
Link to IL-IL MT (Telugu-Tamil & Tamil-Telugu) : http://sampark.iiit.ac.in/sampark/web/ index.php/content