Research
Interests:
Biomedical Data Science
Information Extraction from Unstructured Text
Machine Learning
Natural Language Processing (NLP)
Social Media and Scientific Literature Mining
Topics:
Natural Language Processing (NLP) is the process of using computer algorithms to identify, analyze and derive key elements in unstructured text in a smart and effective way. With the widespread use of online social media and electronic health records, unstructured text is a veritable gold mine, and NLP is the best way to extract value from these resources. I have been working on the following research areas and want to explore further insights in these directions.
Biomedical Literature Mining:
Online biomedical literature is complex, with its domain-specific terminologies and language structures. The free text, meta-data etc. extracted from published biomedical articles are very useful to derive knowledge using NLP and machine learning techniques, e.g., identification of bio-markers and their influence for certain diseases from relevant literature. We need to develop domain specific knowledge graphs in order to extract semantic information from the text. Hence we need to have a large amount of data from freely available resources like PubMED and process them to develop the knowledge graphs following transformer based models. Furthermore, the aim is develop a generic NLP framework for biomedical literature mining which will take minimum input from the end-user e.g., domain specific keywords to identify relevant articles, which will be processed further based on the knowledge graph to identify significant information.
Knowledge Discovery in Social Media for Better Care:
Data available from social media e.g., Twitter, Facebook etc. make it possible to get information about demographics, languages used, locations and social interactions of the users. Knowledge discovery in social media is an upcoming research interest in the field of public health, as it presents new opportunities in epidemiological surveillance and monitoring. NLP has the ability to gather and find meaning in data collected from social media with geo-tagged location to improve the quality of decision making in various public health issues e.g., early prediction of community health hazards over social media or early prediction of signs of mental illness. However, this area has not been explored much, but has the potential to provide meaningful information to the domain experts. Hence I want to focus on this area and high volume of data is needed in order to do so. Besides crawling data from freely available resources like Reddit, we need to buy some amount of data from Facebook and Twitter to access spatial information as well. The aim is to develop a NLP framework to identify significant information from these texts with annotations regarding possible outcomes e.g., adverse effects of drugs or diseases, panic, scarcity of health-care needs, poor awareness etc.
Clinical Text Mining:
Clinical texts also called clinical notes e.g., discharge summary, prescriptions, radiology reports etc. contain family history, lifestyle, diagnoses, medications, treatment plans and various other medical information of the patients. Clinical notes are increasingly being used all over the world, but represent a vast, underused resource for biomedical research. There are many applications of NLP for knowledge discovery in clinical notes to improve the quality of treatment plans and biomedical research. NLP can investigate the association between drugs and possible adverse events, correlations between diseases (comorbidities) and it can be used for early prediction of signs of different diseases e.g., cancer using the clinical notes. Furthermore, NLP can identify the discriminative image features hidden in radiology reports and can support better diagnostic conclusions. In the next few years, I want to focus mostly on the Radiology domain and the plans are as follow:
Collect and process the radiology images and reports from different resources primarily at Government hospitals like AIIMS in India.
De-identify the protected health information like patient’s name, age, gender etc. to make the reports available for research.
Develop novel NLP frameworks for early prediction of signs of cancers and other diseases. Combine both image and text features by extracting them respectively from the radiology images like CT scan and the reports for effective information extraction from these resources.
Evaluate the performance of these frameworks with the help of experts using the past records as ground truths.
Publications:
Journal:
Arnab Roy and Tanmay Basu, Postimpact Similarity: A Similarity Measure for Effective Grouping of Unlabelled Text Using Spectral Clustering. Knowledge and Information Systems, vol. 64, pp. 723-742, Springer, 2022.
Tanmay Basu, Simon Goldsworthy and Georgios V. Gkoutos. A Sentence Classification Framework to Identify Geometric Errors in Radiation Therapy from Relevant Literature. Information, MDPI, vol. 12(4), 139, 2021. DOI: 10.3390/info12040139.
Avideep Mukherjee and Tanmay Basu. A Medoid Based Weighting Scheme for Nearest Neighbors Decision Rule Towards Effective Text Categorization. Springer Nature Applied Sciences, vol. 2, 1009, 2020. DOI: 10.1007/s42452- 020-2738-8.
Swarup Chattopadhyay, Tanmay Basu, Asit K.das, Kuntal Ghosh and C.A.Murthy. Towards Effective Discovery of Natural Communities in Complex Networks and Implications in E-commerce. Electronic Commerce Research, Springer. doi:10.1007/s10660-019-09395-y, 2020.
Swarup Chattopadhyay, Tanmay Basu, Asit K.das, Kuntal Ghosh and C.A.Murthy. A similarity based generalized modularity measure towards effective community discovery in complex networks. Physica A: Statistical Mechanics and its Applications. doi.org/10.1016/j.physa.2019.121338, 2019.
Tim Guetterman, Tammy Chang, Melissa DeJonckheere, Tanmay Basu, Elizabeth Scrubs and VG Vinod Vydiswaran. Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study. Journal of Medical Internet Research, vol. 20(6):e231, 2018.
Tanmay Basu and C. A. Murthy. A Supervised Feature Selection Technique for Effective Text Categorization, International Journal of Machine Learning and Cybernetics, Springer, vol. 7(5), pp. 877-892, 2016.
Tanmay Basu and C. A. Murthy. A Similarity Assessment Technique for Effective Grouping of Documents, Information Sciences, Elsevier, vol. 311, pp. 149-162, 2015.
Tanmay Basu and C. A. Murthy. A Similarity based Supervised Decision Rule for Qualitative Improvement of Text Categorization, Fundamenta Informaticae, IOS Press, vol. 141(4), pp. 275-295, 2015.
Tanmay Basu and C. A. Murthy. Towards Enriching the Quality of k-Nearest Neighbor Decision Rule for Document Classification, International Journal of Machine Learning and Cybernetics, Springer, vol. 5(6), pp. 897-905, 2014.
Tanmay Basu and C. A. Murthy. CUES: A New Approach for Document Clustering, Journal of Pattern Recognition Research, vol. 8(1), pp. 66-84, 2013.
Conference:
Arkapal Panda, Tanmay Basu, Vaibhav Kumar. An Ensemble Learning Framework For Visibility Prediction In Indo-Gangetic Region, published in Proceedings of ICLR 2023 Tiny Papers, Kigali, Rwanda.
Sruthi S, Tanmay Basu. Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models, published in Proceedings of FIRE 2022 Working Notes, pp. 60-65, Kolkata, India.
Harshvardhan Srivastava, Lijin N. S., Sruthi S and Tanmay Basu. NLP-IISERB@eRisk2022: Exploring the Potential of Bag of Words, Document Embeddings and Transformer Based Framework for Early Prediction of Eating Disorder, Depression and Pathological Gambling Over Social Media, published in Proceedings of CLEF 2022 Working Notes, pp. 987-994, Bologna, Italy.
Sourav Saha, Dwaipayan Roy, B Yuvaraj Goud, Chethan S Reddy and Tanmay Basu. NLP-IISERB@Simpletext2022: To Explore the Performance of BM25 and Transformer Based Frameworks for Automatic Simplification of Scientific Texts, published in Proceedings of CLEF 2022 Working Notes, pp. 2852-2857, Bologna, Italy.
Tanmay Basu. IISERB@ LT-EDI-ACL2022: A Bag of Words and Document Embeddings Based Framework to Identify Severity of Depression Over Social Media. published in Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, pp. 234-238, ACL 2022.
Tanmay Basu and Georgios V. Gkoutos. Exploring the Performance of Baseline Text Mining Frameworks for Early Prediction of Self Harm Over Social Media, published in Proceedings of CLEF 2021 Working Notes, pp. 928-937, Romania, 2021.
Shubhaditya Goswami, Sukanya Pal, Simon Goldsworthy and Tanmay Basu. An Effective Machine Learning Framework for Data Elements Extraction from the Literature of Anxiety Outcome Measures to Build Systematic Review, published in Proceedings of Business Information Systems, pp. 247-258, Sevilla, Spain, 2019.
Tim Guetterman, Melissa DeJonckheere, Tammy Chang, Tanmay Basu, Elizabeth Scrubs and VG Vinod Vydiswaran. Integrating Natural Language Processing with Qualitative Text Analysis in Mixed Methods Studies with a Large Qualitative Strand. In proceedings of third Mixed Methods International Research Association (MMIRA) International Conference, 2018.
Sayanta Paul, Sree Kalyani Jandhyala and Tanmay Basu. Early Detection of Signs of Anorexia and Depression Over Social Media using Effective Machine Learning Frameworks, published in Proceedings of CLEF 2018Working Notes, Avignon, France.
Avideep Mukherjee and Tanmay Basu. An Effective Nearest Neighbor Classification Technique Using Medoid Based Weighting Scheme, published in Proceedings of the Fourteenth International Conference on Data Science, pp. 231-234, Las Vegas, USA, 2018.
Arnab Roy and Tanmay Basu. Effective Grouping of Unlabeled Texts using A New Similarity Measure for Spectral Clustering, published in Proceedings of the Fourteenth International Conference on Data Science, pp. 181-184, Las Vegas, USA, 2018.
Ritam Majumder and Tanmay Basu. Towards Developing Effective Machine Learning Frameworks to Identify Toxic Conversations Over Social Media, published in Proceedings of the Fourteenth International Conference on Data Science, pp. 239-240, Las Vegas, USA, 2018.
Anurag Banerjee and Tanmay Basu. Yet Another Weighting Scheme for Collaborative Filtering Towards Effective Movie Recommendation, published in Proceedings of the Fourteenth International Conference on Data Science, pp. 237-238, Las Vegas, USA, 2018.
Tanmay Basu and C. A. Murthy. Effective Text Classification by a Supervised Feature Selection Approach, in Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 918-925, Belgium, 2012.
Tanmay Basu, C. A. Murthy and H. Chakraborty, A Tweak on K Nearest Neighbour Decision Rule, in Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV), pp. 929-935, USA, 2012.
Tanmay Basu and C. A. Murthy. A Feature Selection Method for Improved Document Classification, in Proceedings of the International Conference on Advanced Data Mining and Applications (ADMA), LNCS vol. 7713, pp. 296-305, China, 2012. DOI: 10.1007/978-3-642-35527-1 25
Tanmay Basu and C. A. Murthy. Semantic Relation between Words with the Web as Information Source, in Proceedings of the International Conference on Pattern Recognition and Machine Intelligence (PReMI), LNCS 5909, pp. 267-272, India, 2009. DOI: 10.1007/978-3-642-11164-8 43