We developed the ConvGeN algorithm for data enrichment in the context of imbalanced classification.
We have a long experience in enriching data to enhance the performance of machine learning models when data is scarce. With the rise of Machine Assisted decision-making, clinicians often seek the assistance of machines to understand patterns present in the data collected in their respective clinics for predictive modeling. However, at the level of a single clinic, there is not much data. Moreover, important subgroups in such data often have skewed distributions. To make things harder, privacy policies hinder clinicians to share their data with data scientists. We develop data enrichment or synthetic data generation strategies to counter such problems.
In this project, we maintain a close association with the group of Prof. Olaf Wolkenhauer, at the University of Rostock, Germany.
The FDC workflow compared to the UMAP algorithm shows distinct patient populations on the publicly available liver cirrhosis dataset
Currently, we are investigating several strategies to find patterns in patient data using unsupervised or semi-supervised approaches. We developed the Feature-type distributed clustering workflow that prove to be more effective than the state-of-the-art dimension reduction algorithm UMAP for clinical and biomedical datasets. The workflow is finding applications in diverse clinical and epidemiological problems (see our publication in Nutrition and Diabetes published by Springer Nature)
We are extending this research towards a more general direction of multi-modal data integration in the field of clinical data science. The outlook is to integrate diverse modes of clinical data for patient stratification. Smart dimension reduction approaches for short time-series data (longitudinal patient data, representing multiple patient visits to a clinic) is also a topic of interest.
Although these are the current broad themes of our research, we have experience in diverse applications of deep learning. One of our projects in NLP for example is about developing an Attention-mechanism-based model to extract relationships among molecules from biomedical publications (see published work). We also have experience in providing machine learning support to works in the field of bioinformatics and systems biology (see published work). At IISER-TVM we believe in breaking boundaries among disciplines of science. Thus, as long as there is interesting data, we are interested in a collaboration, be it from any domain.