Medical Visual Question Answering

Adversarial Learning for Medical Visual Question Answering

Medical Visual Question Answering (VQA) is a computer vision task in which a system is given a medical image and a text-based question that is relevant to the image, and the objective is to predict the answer to the question. Medical VQA is gaining a lot of attention in the medical field because it has the potential to augment clinical decision-making and assist patients to get to know more about their health conditions through medical imagery. Medical professionals can save time by only focusing exclusively on the images that the VQA systems identify as critical.

VQA systems produce a joint representation of the question image pair and use it to predict the answer. Learning the joint representation of the question image pair is the most important task in a VQA. To learn the joint representation, the most known approaches focus on studying the multi-modal correlation between the question and the image. As an example, most multi-modal correlation methods try to give more attention to the image regions that are present in the question.

However, these methods may not properly capture the information connected to the answer when the question doesn’t give any clue about the object or the region which is related to the answer. These types of questions can be quite common in medical VQA. As a result, answer-related information could be absent from the image-question joint embedding, resulting in poor performance on the medical VQA task. Our work focuses on incorporating adversarial learning to improve the question-answer embedding for better answer inference.

Therefore, the main objective of the project is to create a visual question answering system for biomedical images. This includes designing and training a machine learning model which contains technology in computer vision and natural language processing fields, and the final model should be able to give a meaningful answer for an asked question for a given image in the biomedical domain .

Dataset : PathVQA

Contributors:

Thanuja Maheepala (Email)

Kaveesha Silva (Email)

Kasun Tharaka (Email)

Principal investigator: Dr. Thanuja Ambegoda (Email)