Research

My research focuses on Natural Language Processing - in particular, I am working on developing NLP tools for linguistically rich Indian languages using deep learning algorithms. These tools can be effectively utilized in Neural Machine Translation, Indian language spoken dialogue systems, Robotics, social media text analytics etc. Along with the machine learning/deep learning tools, I also prepared datasets for different NLP tasks such as Machine Translation, Morphological analysis, Sandhi splitting, Parts-of-Speech tagging and Named Entity Recognition in Indian languages.

My research aims to take the common man to the world of flooded knowledge and let them learn through their spoken language or mother tongue.

Some topics of interest are:

Neural Machine Translation
Models of linguistically rich languages such as Sanskrit, Malayalam, Tamil, Hindi, Telugu and also for Arabic
- Morphological and Sandhi splitter
- Named Entity Recognition (NER) tagger
- Parts of Speech (POS) tagger
- Word Sense Disambiguation (WSD)
Social media text analytics
- Factuality identification
- Sentiment analysis
- Hate speech identification
- Emotion detection
- Fake News Detection
Biomedical text mining
Kernel methods and explicit random feature mapping algorithms
Anomaly Detection and Intrusion Detection

Publications

Chowdary, D. E., Ganesan, R., Dabbara, H., Jyothish Lal, G., & Premjith, B. (2024). Transformer‐Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages. Automatic Speech Recognition and Translation for Low Resource Languages, 259-273.
Radhakrishnan, V., Aadharsh Aadhithya, A., Mohan, J., Visweswaran, M., Jyothish Lal, G., & Premjith, B. (2024). Voice Cloning for Low‐Resource Languages: Investigating the Prospects for Tamil. Automatic Speech Recognition and Translation for Low Resource Languages, 243-257.
Sivasubramanian, A., Devisetty, M., & B, Premjith. (2024). Feature Extraction and Anomaly Detection Using Different Autoencoders for Modeling Intrusion Detection Systems. Arabian Journal for Science and Engineering, 1-13.
Premjith, B., Chakravarthi, B. R., Kumaresan, P. K., Rajiakodi, S., Karnati, S., Mangamuru, S., & Janakiram, C. (2024, March). Findings of the Shared Task on Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu)@ DravidianLangTech 2024. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages (pp. 49-55).
Subramanian, M., Chakravarthi, B.R., Shanmugavadivel, K., Pandiyan, S., Kumaresan, P.K., Palani, B., Premjith, B., Vanaja, K., Mithunja, S., Devika, K. and Haripriya, B., (2024, March). Overview of the Second Shared Task on Fake News Detection in Dravidian Languages: DravidianLangTech@ EACL 2024. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages (pp. 71-78).
Devika, K., Haripriya, B., Vigneshwar, E., Premjith, B., & Chakravarthi, B. R. (2024, March). From Dataset to Detection: A Comprehensive Approach to Combating Malayalam Fake News. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages (pp. 16-23).
Jairam, R., Jyothish, G., & Premjith, B. (2024, March). A Few-Shot Multi-Accented Speech Classification for Indian Languages using Transformers and LLM’s Fine-Tuning Approaches. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages (pp. 1-9).
Premjith, B., Jyothish, G., Sowmya, V., Chakravarthi, B. R., Nandhini, K., Natarajan, R., ... & Reddy, M. (2024, March). Findings of the Shared Task on Multimodal Social Media Data Analysis in Dravidian Languages (MSMDA-DL)@ DravidianLangTech 2024. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages (pp. 56-61).
Jairam, R., Jyothish, G., Premjith, B., & Viswa, M. (2024, March). CEN_Amrita@ LT-EDI 2024: A Transformer based Speech Recognition System for Vulnerable Individuals in Tamil. In Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion (pp. 190-195).
Sreelakshmi, K., Premjith, B., Chakravarthi, B. R., & Soman, K. P. (2024). Detection of Hate Speech and Offensive Language CodeMix Text in Dravidian Languages using Cost-Sensitive Learning Approach. IEEE Access.
Raphel, M., Premjith, B., Sreelakshmi, K., & Chakravarthi, B. R. (2023, September). Hate and Offensive Keyword Extraction from CodeMix Malayalam Social Media Text Using Contextual Embedding. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages (pp. 10-18).
Premjith, B., Sowmya, V., Chakravarthi, B. R., Natarajan, R., Nandhini, K., Murugappan, A., ... & Sn, P. (2023, September). Findings of the shared task on multimodal abusive language detection and sentiment analysis in tamil and malayalam. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages (pp. 72-79).
Priyadharshini, Ruba, Bharathi Raja Chakravarthi, S. Malliga, Subalalitha Cn, S. V. Kogilavani, B. Premjith, Abirami Murugappan, and Prasanna Kumar Kumaresan. "Overview of shared-task on abusive comment detection in tamil and telugu." In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages, pp. 80-87. 2023.
Arul Goutham, R., Premjith, B., Nimal Madhu, M., & Gopalakrishnan, E. A. (2023, May). Forecasting Intraday Stock Price Using Attention Mechanism and Variational Mode Decomposition. In International Conference on Information, Communication and Computing Technology (pp. 525-538). Singapore: Springer Nature Singapore.
Yazhini, V., Nimal Madhu, M., Premjith, B., & Gopalakrishnan, E. A. (2023, May). Deep Learning with Attention Mechanism for Cryptocurrency Price Forecasting. In International Conference on Information, Communication and Computing Technology (pp. 471-484). Singapore: Springer Nature Singapore.
Keshav, S., Jyothish Lal, G., & Premjith, B. (2023). Multimodal approach for code-mixed speech sentiment classification. In Advances in Signal Processing, Embedded Systems and IoT: Proceedings of Seventh ICMEET-2022 (pp. 553-563). Singapore: Springer Nature Singapore.
Anilkumar, A., Yadukrishnan, V., Nimal Madhu, M., Hareesh, V., & Premjith, B. (2023, April). Deep Learning-Based Time Series Forecasting for CO2 Emission. In International Conference on Intelligent Computing & Optimization (pp. 294-303). Cham: Springer Nature Switzerland.
Sreelakshmi, K., Premjith, B., Gopalakrishnan, E. A., & Soman, K. P. (2022). Study of Markov Chains for the Identification of the Hate Contents in Hinglish. In Data Engineering and Intelligent Computing (pp. 215-224). Springer, Singapore.
George, B., Adarsh, S., Prajapati, N., Premjith, B., & Kp, S. (2022, July). Amrita_CEN at SemEval-2022 Task 4: Oversampling-based Machine Learning Approach for Detecting Patronizing and Condescending Language. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) (pp. 515-518).
Ajayan, A. K., Mohanan, K., Anugraha, S., Premjith, B., & Kp, S. (2022, July). Amrita_CEN at SemEval-2022 Task 6: A Machine Learning Approach for Detecting Intended Sarcasm using Oversampling. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) (pp. 834-839).
Kumar, C. A., Maharana, A., Murali, S., Premjith, B., & Kp, S. (2022, May). BERT-Based Sequence Labelling Approach for Dependency Parsing in Tamil. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (pp. 1-8).
Prasanth, S. N., Raj, R. A., Adhithan, P., Premjith, B., & Kp, S. (2022, May). CEN-Tamil@ DravidianLangTech-ACL2022: Abusive Comment detection in Tamil using TF-IDF and Random Kitchen Sink Algorithm. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (pp. 70-74).
Premjith, B., Bharathi Raja Chakravarthi, Malliga Subramanian, B. Bharathi, Soman Kp, V. Dhanalakshmi, K. Sreelakshmi, Arunaggiri Pandian, & Prasanna Kumaresan (2022, May). Findings of the Shared Task on Multimodal Sentiment Analysis and Troll Meme Classification in Dravidian Languages. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 254-260. 2022.
Darshana, S., Theivaprakasham, H., Lal, G. J., Premjith, B., Sowmya, V., & Soman, K. (2022, March). MARS: A Hybrid Deep CNN-based Multi-Accent Recognition System for English Language. In 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR) (pp. 1-6). IEEE.
Sreelakshmi, K., Premjith, B., Gopalakrishnan, E. A., & Soman, K. P. (2022). Study of Markov Chains for the Identification of the Hate Contents in Hinglish. In Data Engineering and Intelligent Computing (pp. 215-224). Springer, Singapore.
Krishna, U. V., Premjith, B., & Soman, K. P. (2022). A Comparative Study of Pre-trained Gene Embeddings for COVID-19 mRNA Vaccine Degradation Prediction. In Proceedings of the Seventh International Conference on Mathematics and Computing (pp. 301-308). Springer, Singapore.
Sai Kesav, R., Barathi Ganesh, H. B., Premjith, B., & Soman, K. P. (2022). Ink Recognition Using TDNN and Bi-LSTM. In High Performance Computing and Networking (pp. 35-45). Springer, Singapore.
Priyamvada, R., Govind, D., Menon, V. K., Premjith, B., & Soman, K. P. (2022). Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture. In Intelligent Data Engineering and Analytics (pp. 41-49). Springer, Singapore.
Premjith, B., and K. P. Soman (2021). Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level. Transactions on Asian and Low-Resource Language Information Processing 20.6 (2021): 1-17.
Isha Indhu S., Kumar, K. S., Karthikeyan, L., Premjith, B & Kp, S. (2021, June). Amrita_CEN_NLP@ SDP2021 Task A and B. In Proceedings of the Second Workshop on Scholarly Document Processing (pp. 146-149).
Kesav, R. S., Premjith, B., & Soman, K. P. (2021, April). Dependency Parser for Hindi Using Integer Linear Programming. In International Conference on Advances in Computing and Data Sciences (pp. 42-51). Springer, Cham.
K Sreelakshmi, B Premjith, Soman KP (2021), Amrita_CEN_NLP@ DravidianLangTech-EACL2021: Deep Learning-based Offensive Language Identification in Malayalam, Tamil and Kannada, Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, 249-254
Sasidhar, T. T., Premjith, B., Sreelakshmi, K., & Soman, K. P. (2021). Sentiment Analysis on Hindi–English Code-Mixed Social Media Text. In Innovations in Computer Science and Engineering (pp. 615-622). Springer, Singapore.
Aindriya Barua, Thara Sasidharan, Premjith B., Dr. Soman K. P. (2021). Analysis of Contextual and Non-contextual Word Embedding Models for Hindi NER with Web Application for Data Collection. International Advanced Computing Conference, Volume 1367
Aparna, T. S., Simran, K., Premjith, B., & Soman, K. P. (2021). Aspect-Based Sentiment Analysis in Hindi: Comparison of Machine/Deep Learning Algorithms. In Inventive Computation and Information Technologies (pp. 81-91). Springer, Singapore.
Bharathi Raja Chakravarthi, Anand Kumar M, John P McCrae, B Premjith, KP Soman, Thomas Mandl (2020). Overview of the track on HASOC-Offensive Language Identification-DravidianCodeMix. In FIRE (Working Notes) (pp. 112-120)
DJ Ratnam, KP Soman, TK Bijimol, MG Priya, B Premjith (2020), Hybrid Machine Translation System for the Translation of Simple English Prepositions and Periphrastic Causative Constructions from English to Hindi, Applications in Ubiquitous Computing. Springer, Cham 247-263.
JP Sanjanasri, B Premjith, Vijay Krishna Menon, KP Soman (2020). cEnTam: Creation and Validation of a New English-Tamil Bilingual Corpus, Proceedings of the 13th Workshop on Building and Using Comparable Corpora, 61-64
K Sreelakshmi, B Premjith, KP Soman (2020). Detection of Hate Speech Text in Hindi-English Code-mixed Data, Procedia Computer Science, Elsevier , 171 (737-744 )
TT Sasidhar, B Premjith, KP Soman (2020). Emotion Detection in Hinglish (Hindi+ English) Code-Mixed Social Media Text, Procedia Computer Science, Elsevier , 171 (1346-1352 )
Sreelakshmi K, Premjith.B, Soman K P (2019). Amrita CEN at HASOC 2019: Hate Speech Detection in Roman and Devanagiri Scripted Text. Working Notes of FIRE 2019 - Forum for Information Retrieval Evaluation. 2019 Dec 12-15; 366-369
Chandni M, Priyanga V T, Premjith B, Soman K.P (2019). Amrita CEN CIQ: Classification of Insincere Questions. Working Notes of FIRE 2019 - Forum for Information Retrieval Evaluation. 2019 Dec 12-15; 456-462
Premjith B, Chandni Chandran V, Shriganesh Bhat and Soman KP (2019). A Machine Learning Approach for Identifying Compound Words from a Sanskrit Text. Proceedings of the 6th International Sanskrit Computational Linguistics Symposium, Association for Computational Linguistics. 2019 Oct 23-25;45-51.
Premjith B, Soman K.P, Prabaharan P (2019). Amrita CEN@ FACT: Factuality Identification in Spanish Text. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). CEUR Workshop Proceedings, CEUR-WS, Bilbao, Spain (9 2019).
M Anand Kumar, B Premjith, Shivkaran Singh, S Rajendran, KP Soman (2019). An overview of the shared task on machine translation in Indian languages (MTIL)–2017. Journal of Intelligent Systems. 2019 Jul 26;28(3):455-64.
Athira Gopalakrishnan, KP Soman, B Premjith (2019). A Deep Learning-Based Named Entity Recognition in Biomedical Domain. Emerging Research in Electronics, Computer Science and Technology, Springer, Singapore, 517-526.
Premjith B, M Anand Kumar, Soman KP, D Jyothi Ratnam (2019). Embedding Linguistic Features in Word Embedding for Preposition Sense Disambiguation in English—Malayalam Machine Translation Context. Recent Advances in Computational Intelligence, Springer, 341-370.
Premjith B, M Anand Kumar, Soman KP (2019). Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus: Special Issue on Natural Language Processing. Journal of Intelligent Systems.
Greeshma Prabha, PV Jyothsna, KK Shahina, B Premjith, KP Soman (2019). A Deep Learning Approach for Part-of-Speech Tagging in Nepali Language. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI).
Premjith B, Soman K.P, Prabaharan Poornachandran (2018). A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features. In 10th annual meeting of the Forum for Information Retrieval Evaluation.
Premjith, B., Soman, K. P., & Kumar, M. A. (2018). A deep learning approach for Malayalam morphological analysis at character level. Procedia Computer Science, 132, 47-54.
Premjith B, Soman K.P, M Anand Kumar (2018). Deep learning based morphological analysis of Tamil nouns and verbs. In Research Conference on Data and Decision Science (RCDDS' 18).
D Jyothi Ratnam, KP Soman, B Premjith, MG Priya (2018). Transfer of Simple English Prepositions ‘to’ and ‘with’ Into Hindi Utilizing Linguistic Features of the Predicative Part of a Sentence with Machine Learning Approach in an English to Hindi MT Context. Journal of Advanced Research in Dynamical & Control Systems. 2018;10:240-64
Aravind Jaya Prakash and Bhavukam Premjith Dhanya Sathyan, Kalpathy Balakrishnan Anand (2018). Modeling the Fresh and Hardened Stage Properties of Self-Compacting Concrete using Random Kitchen Sink Algorithm. In International Journal of Concrete Structures and Materials 12.1: 24.
Ratnam, D. J., Kumar, M. A., Premjith, B., Soman, K. P., & Rajendran, S. (2018). Sense Disambiguation of English Simple Prepositions in the Context of English–Hindi Machine Translation System. In Knowledge Computing and Its Applications (pp. 245-268). Springer, Singapore.
K P Soman R. Vinayakumar, S. Sachin Kumar, B. Premjith, & Poornachandran Prabaharan (2017). Deep Stance and Gender Detection in Tweets on Catalan Independence@Ibereval 2017. In Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017) .
R Vinayakumar, Premjith B, Sachin Kumar S, Prabaharan Poornachandran . (2017). deepCybErNet at EmoInt-2017: Deep Emotion Intensities in Tweets. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 259-263).
Aravind J Prakash, Dhanya Sathyan, K B Anand, Premjith B (2017), Prediction of rheological properties of self compacting concrete: Regularized least square approach. International Journal of Earth Sciences and Engineering .
Vinayakumar, R., Kumar, S., Premjith, B., Prabaharan, P., & Soman, K. P. DEFT 2017-Texts Search@ TALN/RECITAL 2017: Deep Analysis of Opinion and Figurative language on Tweets in French. In 24e Conférence sur le Traitement Automatique des Langues Naturelles (TALN) (p. 99).
Premjith, B., Kumar, S. S., Shyam, R., Kumar, M. A., & Soman, K. P. (2016). A Fast and Efficient Framework for Creating Parallel Corpus. Indian Journal of Science and Technology, 9(45).
Soman K.P Prabaharan Poornachandran, Premjith B (2016) . A distributed approach for predicting malicious activities in a network from a streaming data with support vector machine and explicit random feature mapping. The IIOAB Journal .
Kumar, S. S., Premjith, B., Kumar, M. A., & Soman, K. P. (2015). AMRITA_CEN-NLP@ SAIL2015: Sentiment analysis in Indian Language using regularized least square approach with randomized feature learning. In International Conference on Mining Intelligence and Knowledge Exploration (pp. 671-683). Springer, Cham.
Premjith, B., & Soman, K. P. Computational Experiment of One Class SVM in Excel. International Journal of Applied Engineering Research 10 (20), 19356-19360 .
Premjith, B., Mohan, N., Poornachandran, P., & Soman, K. P. (2015). Audio Data Authentication with PMU Data and EWT. Procedia Technology, 21, 596-603.
Premjith, B., Kumar, S. S., Manikkoth, A., Bijeesh, T. V., & Soman, K. P. (2013). Insight into Primal Augmented Lagrangian Multilplier Method. arXiv preprint arXiv:1312.7637.
Premjith B. Vidya M. Poornima S .V. and K.P Soman. A Level Set Methodology for Sanskrit Document Binarization and Character Segmentation. (Best paper award)