Research
My research is focused on multi-modal data processing, especially visual and textual data, for intelligent decision making. My current main area of application involves multi-modal biomedical data including varying data modalities such as CT scans, radiology reports, and longitudinal electronic medical records. major applications of my current research include intelligent clinical diagnosis, treatment strategy recommendation as well optimal healthcare resource management. Multi-modal data is part and parcel of many other applications including social media data analysis (involves data modalities like textual status update messages, images and videos, location signal, 'likes' and links count, etc.), virtual assistants, assistive technology for visually impaired persons, and cyber-physical-social systems (CPSS) for modern smart cities. I am broadly interested in bringing together cross-disciplinary expertise for understanding and utilizing multi-modal data to improve quality of human life.
I have a vast skill-set including techniques from a variety of fields like Machine Learning, Artificial Intelligence, Computer Vision, Image Processing, Natural Language Processing, Biomedical Informatics. I am currently collaborating with the following.
The following is a list of some of my current and past projects.
Critical Event Prediction through Predictive Modeling of EMR Data
We are developing multi-modal temporal fusion models for prediction of critical healthcare events like hospitalization based on information coded in electronic medical records (EMR). Such prediction is crucial for effective management of healthcare resources. Current focus of this project is the data generated by the patients of COVID-19. Our model effectively makes long-term prediction for hospitalization at the time of RT-PCR test. Innovative temporal feature engineering is used to handle extreme sparsity of data. Comprehensive evaluation is performed for fusion techniques to identify the optimal technique given unusual nature of the data generated due to pandemic.
Bridging the 'Gap' between Structured and Unstructured Radiology Reports
We are developing intelligent NLP models to automatically interpret free-flowing text of radiology reports focused at clinical diagnosis. Coronary Computed Tomography Angiography (CCTA) studies are associated with long text of radiology reports rich with linguistic nuances and variations in writing styles as well as choice/lack thereof of template. Our model can effectively assign standardized score to templated as well as untemplated CCTA reports. Our model was trained without any untemplated reports but generalized well to such reports because of our intelligent domain-specific text standardization techniques.
Context-driven Framework for Generating Realistic Image Descriptions
We developed a context-sensitive generative model for automatically generating realistic descriptions for images. Standard image annotation benchmarks often include artificial image captions written by people with no background knowledge regarding images. We evaluated our framework for image-caption pairs collected from news websites. Context was extracted from heterogeneous sources like scene characteristics of images, corresponding news articles and meta associated with news articles.
Sample network of named entities - edge width indicates relation strength
Semantic Network of Named Entities
Named entities (names of people, places and organizations) often characterize the most important part of free-flowing text. Most common search queries include named entities. It is important to understand relationships between named entities to be able understand, summarize, visualize or answer queries about text. We developed an unsupervised framework to quantify the relationship strength and characterize relationship type between named entities though semantic analysis of text surrounding these entities.
Feature-independent Context Estimation for Image Annotation
Context plays an important role in predicting suitable textual annotations for images. We devised a tensor decomposition based model to extract context from raw images. Hence, our context estimation model was completely feature-independent. Estimated context was highly useful for automatic annotation process.
High frequency - high precision trend of a sample annotation system
Stable precision across frequencies for our MultiSC image annotation system
Symmetric Classifier for Automatic Image Annotation
Image annotation systems often achieve high precision for highly frequent tags/labels only. Text mining literature indicates that moderately frequent/relatively rare words are more important in search and retrieval process than highly frequent words. Image annotation can be posed as multi-label classification problem with a very large number of class labels. We developed a symmetric classifier using multiple layers of sparse coding that achieves high precision and recall for image tags/labels with a wide range of frequencies.
Social Data Processing for Cyber-Physical-Social Systems (CPSS)
Traditional cyber-physical-systems are rapidly turning into cyber-physical-social systems (CPSS) due to connectivity of human users and the infrastructure. Social data (GPS information of users, social media posts, etc.) processing is crucial to effective management and security of CPSS against cyber attacks. We are developing attack-resilient models for CPSS through exploitation involved social data.
Review Services
IEEE Transactions on Image Processing (TIP)
IEEE Pattern Analysis and Machine Intelligence (PAMI)
Pattern Recognition Letters
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Elsevier journal on Image and Vision Computing (IVC)