Research

Interests:

Topics:

Natural Language Processing (NLP) is the process of using computer algorithms to identify, analyze and derive key elements in unstructured text in a smart and effective way. With the widespread use of online social media and electronic health records, unstructured text is a veritable gold mine, and NLP is the best way to extract value from these resources. I have been working on the following research  areas and want to explore further insights in these directions. 



Biomedical Literature Mining

Online biomedical literature is complex, with its domain-specific terminologies and language structures. The free text, meta-data etc. extracted from published biomedical articles are very useful to derive knowledge using NLP and machine learning techniques, e.g., identification of bio-markers and their influence for certain diseases from relevant literature. We need to develop domain specific knowledge graphs in order to extract semantic information from the text. Hence we need to have a large amount of data from freely available resources like PubMED and process them to develop the knowledge graphs following transformer based models. Furthermore, the aim is develop a generic NLP framework for biomedical literature mining which will take minimum input from the end-user e.g., domain specific keywords to identify relevant articles, which will be processed further based on the knowledge graph to identify significant information. 



Knowledge Discovery in Social Media for Better Care: 

Data available from social media e.g., Twitter, Facebook etc. make it possible to get information about demographics, languages used, locations and social interactions of the users. Knowledge discovery in social media is an upcoming research interest in the field of public health, as it presents new opportunities in epidemiological surveillance and monitoring. NLP has the ability to gather and find meaning in data collected from social media with geo-tagged location to improve the quality of decision making in various public health issues e.g., early prediction of community health hazards over social media or early prediction of signs of mental illness. However, this area has not been explored much, but has the potential to provide meaningful information to the domain experts. Hence I want to focus on this area and high volume of data is needed in order to do so. Besides crawling data from freely available resources like Reddit, we need to buy some amount of data from Facebook and Twitter to access spatial information as well. The aim is to develop a NLP framework to identify significant information from these texts with annotations regarding possible outcomes e.g., adverse effects of drugs or diseases, panic, scarcity of health-care needs, poor awareness etc.



Clinical Text Mining

Clinical texts also called clinical notes e.g., discharge summary, prescriptions, radiology reports etc. contain family history, lifestyle, diagnoses, medications, treatment plans and various other medical information of the patients. Clinical notes are increasingly being used all over the world, but represent a vast, underused resource for biomedical research. There are many applications of NLP for knowledge discovery in clinical notes to improve the quality of treatment plans and biomedical research. NLP can investigate the association between drugs and possible adverse events, correlations between diseases (comorbidities) and it can be used for early prediction of signs of different diseases e.g., cancer using the clinical notes. Furthermore, NLP can identify the discriminative image features hidden in radiology reports and can support better diagnostic conclusions. In the next few years, I want to focus mostly on the Radiology domain and the plans are as follow:



Publications:

Journal:

Conference: