Ongoing Research Projects
Ongoing Research Projects
I am currently working on an exciting project that explores the integration of Quantum Circuits within multimodal learning frameworks to advance visual-text representation learning. This project explores the integration of computer vision, natural language processing (NLP), and quantum machine learning in a QuantumCLIP framework to address real-world tasks, such as medical image-text pairing in noisy environments.
The key idea is that as evident from the literatures I have reviewed, Quantum Computing benefits can be leveraged in case of Noisy Medical Images.
The primary goal is to enhance traditional multimodal frameworks, like CLIP (Contrastive Language-Image Pretraining), by replacing the classical visual encoder with a quantum circuit-based encoder. The quantum-enhanced model is designed to leverage the unique capabilities of quantum systems, such as access to exponentially larger feature spaces, to improve performance in multimodal tasks in noisy situations.
The project is divided into three main phases:
Research and Related Work Exploration:
I am conducting an in-depth literature review focusing on quantum circuits, quantum neural networks (QNNs), and their application to multimodal learning. Key references include:
Jaderberg et al., "Quantum Self-Supervised Learning" (2022).
Wang et al., "A Hybrid Quantum–Classical Neural Network for Learning Transferable Visual Representation" (2023).
This step ensures a solid understanding of state-of-the-art quantum-enhanced learning frameworks.
Framework Implementation:
I am re-implementing the CLIP framework using the PyTorch library and Hugging Face for the text encoder.
The visual encoder is replaced with a Quantum Circuit (QC) implemented using Pennylane, which is integrated seamlessly into PyTorch as a differentiable layer.
This hybrid classical-quantum model will bridge the gap between classical machine learning methods and quantum computing.
Evaluation on MedMNIST Dataset:
The framework will be evaluated on the MedMNIST dataset, a benchmark for medical image classification.
I will create image-text pairs by associating medical images with their corresponding label descriptions and evaluate the model’s performance using appropriate metrics, such as accuracy and F1-score.
The evaluation will also involve analyzing the quantum circuit's contributions and justifying the choice of performance metrics.
4. Configurations and Results
Using the MedMNIST dataset, corresponding text descriptions were created for each image label (e.g.,"This is adipose") to serve as inputs to a text encoder, enabling a multi-modal learning approach. The model aligns image features with their semantic textual representations using quantum circuits for enhanced expressiveness. The main metric considered here is cosine similarity to detect image-text embeddings alignment.
Configuration 1(Hybrid ResNet-Quantum Circuit):
ResNet-18 served as the image encoder, followed by a quantum circuit utilizing Hadamard gates, RX rotations, and RZ transformations. Variants of this configuration (1.1–1.5) yielded average results, with validation accuracies in ~31%, average F1-scores between ~21.87%, and average precisions up to 20.08%. The best-performing variant (1.5) incorporated refinements like layer normalization, RY rotations, and CRX gates in a ring topology, achieving a cosine similarity of 0.5727. This highlighted improved alignment between image and text embeddings but did not significantly enhance classification performance.
GitHub Code Repo: Link
Configuration 2 (Quantum Circuit as Visual Encoder):
ResNet encoder was fully replaced by an 8-qubit quantum circuit featuring AngleEmbedding and BasicEntanglerLayers for increased entanglement and expressiveness. Despite these advancements, the model struggled to learn meaningful features, achieving a validation accuracy of 16.20%, an F1-score of 12.75%, and good alignment between image and text embeddings, as evidenced by around 0.485 cosine similarity.
GitHub Code Repo: Link
Configuration 3 (QuantumCLIP Hybrid Model):
The model incorporated the vision encoder from the CLIP (ViT-based) model and applied a quantum circuit for additional feature extraction. This configuration demonstrated modest improvements, achieving a validation accuracy of 17%, an F1- score of 9.17%, and a precision of 6.28%, with improved text-image alignment as indicated by better cosine similarity scores compared to previous configurations.
Conclusion
This project explored integrating computer vision, NLP, and quantum machine learning through a QuantumCLIP framework. While initial results show modest performance, I am working to improve accuracy by optimizing quantum circuit designs. As a next step, I plan to apply this approach to the Quan dataset of noisy medical images to fully leverage quantum benefits in handling larger feature spaces and complex patterns
I am currently working on a project focused on understanding and detecting ambiguity in paraphrased questions generated by Large Language Models (LLMs) with research collaborators . This research addresses a growing challenge: while LLMs are capable of generating diverse paraphrases, they often introduce subtle ambiguities that alter the original meaning or create confusion — especially in sensitive domains like healthcare and education.
This project aims to systematically evaluate, detect, and mitigate ambiguity in LLM outputs, leveraging advanced evaluation techniques, dataset curation, and model development.
The primary goal is to quantitatively and qualitatively assess ambiguity in LLM-generated paraphrases and to develop reliable detection frameworks.
The research also seeks to build a benchmark dataset for ambiguity detection and propose guidelines for robust prompt engineering to minimize ambiguity in downstream applications.
That means when a user gives a prompt to a llm, depending on the phrasing, the responses may vary.
The project is structured into several key phases:
A subset (~30,000 samples) of the MedRedQA dataset was curated, focusing on longer, complex, and potentially ambiguous questions.
Filtering strategies prioritized questions that could trigger varying interpretations when paraphrased.
Several state-of-the-art LLMs were prompted to generate paraphrases of the selected questions.
Special attention was given to prompting strategies that would stress-test the models’ understanding and rephrasing capabilities.
The paraphrased outputs were manually and semi-automatically analyzed for semantic shifts, vagueness, underspecification, and context loss.
Specific criteria were developed to label paraphrases as “ambiguous” or “non-ambiguous”.
We are still at paraphrasing part.
This project contributes to a deeper understanding of how ambiguity arises in LLM-generated paraphrases and lays the groundwork for automated ambiguity detection systems.
Future work includes expanding the benchmark dataset, refining detection models, and proposing standardized evaluation protocols to assess ambiguity in LLM outputs more reliably.
A GitHub repository documenting the code, dataset preparation scripts, and baseline models will be made available soon.