Publications:
Triangular Consistency as a Universal Constraint for Learning Optical Flow (ECCV 2026 Submission)
Authors: Yi Xiao, Carlos Rodriguez Coronel, Jing Zhan, Haniyeh Ehsani Oskouie, Alex Wong, Dong Lao
Abstract: We propose triangular consistency as a first-principled constraint for optical flow, which is agnostic to network architecture, supervision type, and dataset, and applies to both image-pair and multi-frame settings. This simple but powerful constraint is to compose two flows to induce a third flow and enforce consistency among the three. The composed flows may arise from (i) image pairs, yielding cycle consistency; (ii) multiple video frames, producing longer-range motion through temporal chaining; or (iii) image pairs combined with controlled synthetic transformations, which becomes structured data augmentation. This triangular consistency introduces negligible computational overhead and requires no additional human annotations. Since it is derived directly from the geometry of optical flow, it imposes a universal constraint that relies on no model-specific assumptions and serves as a plug-and-play component for optical flow training. Experiments show consistent improvement across supervised, unsupervised, and transfer learning settings.
MMLoP: Multi-Modal Low-Rank Prompting for Efficient Vision-Language Adaptation (ECCV 2026 Submission)
Authors: Sajjad Ghiasvand, Haniyeh Ehsani Oskouie, Mahnoosh Alizadeh, Ramtin Pedarsani
Abstract: Prompt learning has become a dominant paradigm for adapting vision-language models (VLMs) such as CLIP to downstream tasks without modifying pretrained weights. While extending prompts to both vision and text encoders across multiple transformer layers significantly boosts performance, it dramatically increases the number of trainable parameters, with state-of-the-art methods requiring millions of parameters and abandoning the parameter efficiency that makes prompt tuning attractive. In this work, we propose MMLoP (MultiModal Low-Rank Prompting), a framework that achieves deep multi-modal prompting with only 11.5K trainable parameters, comparable to early text-only methods like CoOp. MMLoP parameterizes vision and text prompts at each transformer layer through a low-rank factorization, which serves as an implicit regularizer against overfitting on few-shot training data. To further close the accuracy gap with state-of-the-art methods, we introduce three complementary components: a self-regulating consistency loss that anchors prompted representations to frozen zero-shot CLIP features at both the feature and logit levels, a uniform drift correction that removes the global embedding shift induced by prompt tuning to preserve class-discriminative structure, and a shared up-projection that couples vision and text prompts through a common low-rank factor to enforce cross-modal alignment. Extensive experiments across three benchmarks and 11 diverse datasets demonstrate that MMLoP achieves a highly favorable accuracy-efficiency tradeoff, outperforming the majority of existing methods including those with orders of magnitude more parameters, while achieving a harmonic mean of 79.70% on base-to-novel generalization.
SciPredict: Can LLMs Predict the Outcomes of Research Experiments in Natural Sciences? (FM4Science @ ICLR 2026; ICML 2026 Submission)
Authors: Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskouie, Shayan Shabihi, Erich Liang, Andrea Toledo, Guillermo Mangialardi, Sergio Fonrouge, Ed-Yeremai Hernández Cardona, Paula Vergara, Utkarsh Tyagi, Chen Bo Calvin Zhang, Pavi Bhatter, Nicholas Johnson, Furong Huang, Ernesto Gabriel Hernández Montoya, and Bing Liu
Abstract: Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes before committing resources to costly physical validation. While existing benchmarks evaluate LLMs on scientific knowledge and reasoning, their ability to predict experimental outcomes—a task where AI could significantly exceed human capabilities—remains largely underexplored. We introduce SciPredict, a benchmark comprising 405 tasks derived from recent empirical studies in 33 specialized sub-fields of physics, biology, and chemistry. SciPredict addresses two critical questions: (a) can LLMs predict the outcome of scientific experiments with sufficient accuracy? and (b) can such predictions be reliably used in the scientific research process? Evaluations reveal fundamental limitations on both fronts. Model accuracies are 14-26% and human expert performance is ≈20%. Although some frontier models exceed human performance model accuracy is still far below what would enable reliable experimental guidance. Even within the limited performance, models fail to distinguish reliable predictions from unreliable ones, achieving only ≈20% accuracy regardless of their confidence or whether they judge outcomes as predictable without physical experimentation. Human experts, in contrast, demonstrate strong calibration: their accuracy increases from ≈5% to ≈80% as they deem outcomes more predictable without conducting the experiment. SciPredict establishes a rigorous framework demonstrating that superhuman performance in experimental science requires not just better predictions, but better awareness of prediction reliability. For reproducibility all our data and code are provided at https://anonymous.4open.science/r/SciPredict-AI01.
Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models (Trustworthy AI @ ICLR 2026; ECCV 2026 Submission)
Authors: Haniyeh Ehsani Oskouie*, Sajjad Ghiasvand*, Mahnoosh Alizadeh, Ramtin Pedarsani
Abstract: Vision-Language Models (VLMs) such as CLIP have shown remarkable performance in cross-modal tasks through large-scale contrastive pre-training. To adapt these large transformer-based models efficiently for downstream tasks, Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA have emerged as scalable alternatives to full fine-tuning, especially in few-shot scenarios. However, like traditional deep neural networks, VLMs are highly vulnerable to adversarial attacks, where imperceptible perturbations can significantly degrade model performance. Adversarial training remains the most effective strategy for improving model robustness in PEFT. In this work, we propose AdvCLIP-LoRA, the first algorithm designed to enhance the adversarial robustness of CLIP models fine-tuned with LoRA in few-shot settings. Our method formulates adversarial fine-tuning as a minimax optimization problem and provides theoretical guarantees for convergence under smoothness and nonconvex-strong-concavity assumptions. Empirical results across eight datasets using ViT-B/16 and ViT-B/32 models show that AdvCLIP-LoRA significantly improves robustness against common adversarial attacks (e.g., FGSM, PGD), without sacrificing much clean accuracy. These findings highlight AdvCLIP-LoRA as a practical and theoretically grounded approach for robust adaptation of VLMs in resource-constrained settings. The code is available at https://github.com/sajjad-ucsb/AdvCLIP-LoRA.
MI-to-Mid Distilled Compression (M2M-DC): An Hybrid-Information-Guided-Block Pruning with Progressive Inner Slicing Approach to Model Compression (WCCI 2026 submission)
Authors: Lionel Levine, Haniyeh Ehsani Oskouie, Sajjad Ghiasvand, and Majid Sarrafzadeh
Abstract: We introduce MI-to-Mid Distilled Compression (M2M-DC), a two-scale, shape-safe compression framework that interleaves information-guided block pruning with progressive inner slicing and staged knowledge distillation (KD). First, M2M-DC ranks residual (or inverted-residual) blocks by a label-aware mutual information (MI) signal and removes the least informative units (structured prune-after-training). It then alternates short KD phases with stage-coherent, residual-safe channel slicing: (i) stage “planes” (co-slicing conv2 out-channels with the downsample path and nextstage inputs), and (ii) an optional mid-channel trim (conv1 out / bn1 / conv2 in). This targets complementary redundancy—whole computational motifs and within-stage width—while preserving residual shape invariants. On CIFAR-100, M2M-DC yields a clean accuracy–compute frontier. For ResNet-18, we obtain 85.46% Top-1 with 3.09M parameters and 0.0139 GMacs (−72% params, −63% GMacs vs. teacher; mean final 85.29% over three seeds). For ResNet-34, we reach 85.02% Top-1 with 5.46M params and 0.0195 GMacs (−74%/−74% vs. teacher; mean final 84.62%). Extending to invertedresiduals, MobileNetV2 achieves a mean final 68.54% Top-1 at ∼ 1.71M params (−27%) and ∼ 0.0186 conv-GMacs (−24%), improving over the teacher’s 66.03% by +2.5 points across three seeds. Because M2M-DC exposes only a thin, architecture-aware interface (blocks, stages, and downsample/skip wiring), it generalizes across residual CNNs and extends to inverted-residual families with minor legalization rules. The result is a compact, practical recipe for deployment-ready models that match or surpass teacher accuracy at a fraction of the compute.
Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles (ICML 2026 Submission)
Authors: Dong Lao, Yuxiang Zhang, Haniyeh Ehsani Oskouie, Yangchao Wu, Alex Wong, Stefano Soatto
Abstract: We propose a test-time defense mechanism against adversarial attacks - imperceptible image perturbations that significantly alter a model’s predictions. Unlike existing methods that rely on feature filtering or smoothing, which can lead to information loss, we propose to "combat noise with noise," which leverages Stochastic Resonance to enhance robustness while minimizing information loss. Our approach introduces small translational perturbations to the input image, aligns the transformed feature embeddings, and aggregates them before mapping back to the original reference frame. This can be expressed in a closed-form formula, which can be deployed on any existing architecture without modification, re-training, or fine-tuning for specific attack types. The method is entirely training-free, architecture-agnostic, and attack-agnostic. In the experiments, beyond demonstrating its effectiveness with image classification, we present test-time defense results on dense prediction tasks such as stereo matching and optical flow, highlighting its versatility and practicality in real-world scenarios. In particular, we reduce the prediction error by as much as 71% on stereo matching and 28% on optical flow, demonstrating the effectiveness of our method.
Exploring Cross-model Neuronal Correlations in the Context of Predicting Model Performance and Generalizability (ICLR 2026 Workshop on Trustworthy AI submission)
Authors: Haniyeh Ehsani Oskouie, Sajjad Ghiasvand, Lionel Levine, and Majid Sarrafzadeh
Abstract: As Artificial Intelligence (AI) models are increasingly integrated into critical systems, the need for a robust framework to establish the trustworthiness of AI is increasingly paramount. While collaborative efforts have established conceptual foundations for such a framework, there remains a significant gap in developing concrete, technically robust methods for assessing AI model quality and performance. This paper introduces a novel approach for assessing a newly trained model's performance based on another known model by calculating the correlation between neural networks. The proposed method evaluates correlations by determining if, for each neuron in one network, there exists a neuron in the other network that produces similar output. This approach has implications for memory efficiency, allowing for the use of smaller networks when high correlation exists between networks of different sizes. On ImageNet-pretrained ResNets, DenseNets, and EfficientNets, partial layer comparisons recover intuitive architectural affinities, indicating that the procedure scales with reasonable approximations. These results support representational alignment as a lightweight compatibility check that complements standard accuracy and calibration, enabling early external validation of new models.
Exploring the Impact of Dataset Statistical Effect Size on Model Performance and Data Sample Size Sufficiency (ICAD 2026)
Authors: Arya Hatamian, Haniyeh Ehsani Oskouie*, Lionel Levine*, and Majid Sarrafzadeh
Abstract: Having a sufficient quantity of quality data is a critical enabler of training effective machine learning models. Being able to effectively determine the adequacy of a dataset prior to training and evaluating a model’s performance would be an essential tool for anyone engaged in experimental design or data collection. However, despite the need for it, the ability to prospectively assess data sufficiency remains an elusive capability. We report here on two experiments undertaken in an attempt to better ascertain whether or not basic descriptive statistical measures can be indicative of how effective a dataset will be at training a resulting model. Leveraging the effect size of our features, this work first explores whether or not a correlation exists between effect size, and resulting model performance (theorizing that the magnitude of the distinction between classes could correlate to a classifier’s resulting success). We then explore whether or not the magnitude of the effect size will impact the rate of convergence of our learning rate, (theorizing again that a greater effect size may indicate that the model will converge more rapidly, and with a smaller sample size needed). Our results appear to indicate that this is not an effective heuristic for determining adequate sample size or projecting model performance, and therefore that additional work is still needed to better prospectively assess adequacy of data.
Leveraging Large Language Models and Topic Modeling for Toxicity Classification (ICNC 2025)
Authors: Haniyeh Ehsani Oskouie*, Christina Chance*, Claire Huang*, Margaret Capetz*, Elizabeth Eyeson*, Majid Sarrafzadeh
Abstract: Content moderation and toxicity classification represent critical tasks with significant social implications. However, studies have shown that major classification models exhibit tendencies to magnify or reduce biases and potentially overlook or disadvantage certain marginalized groups within their classification processes. Researchers suggest that the positionality of annotators influences the gold standard labels in which the models learned from propagate annotators' bias. To further investigate the impact of annotator positionality, we delve into fine-tuning BERTweet and HateBERT on the dataset while using topic modeling strategies for content moderation. The results indicate that fine-tuning the models on specific topics results in a notable improvement in the F1 score of the models when compared to the predictions generated by other prominent classification models such as GPT-4, PerspectiveAPI, and RewireAPI. These findings further reveal that the state-of-the-art large language models exhibit significant limitations in accurately detecting and interpreting text toxicity contrasted with earlier methodologies. Code is available at https://github.com/aheldis/Toxicity-Classification.git.
Attack on Scene Flow using Point Clouds (IEEE MLSP 2024)
Authors: Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, and Shohreh Kasaei
Abstract: Deep neural networks have made significant advancements in accurately estimating scene flow using point clouds, which is vital for many applications like video analysis, action recognition, and navigation. The robustness of these techniques, however, remains a concern, particularly in the face of adversarial attacks that have been proven to deceive state-of-the-art deep neural networks in many domains. Surprisingly, the robustness of scene flow networks against such attacks has not been thoroughly investigated. To address this problem, the proposed approach aims to bridge this gap by introducing adversarial white-box attacks specifically tailored for scene flow networks. Experimental results show that the generated adversarial examples obtain up to 33.7 relative degradation in average end-point error on the KITTI and FlyingThings3D datasets. The study also reveals the significant impact that attacks targeting point clouds in only one dimension or color channel have on average end-point error. Analyzing the success and failure of these attacks on the scene flow networks and their 2D optical flow network variants shows a higher vulnerability for the optical flow networks. Code is available at https://github.com/aheldis/Attack-on-Scene-Flow-using-Point-Clouds.git.
Interpretation of Neural Networks is Susceptible to Universal Adversarial Perturbations (ICASSP 2023)
Authors: Haniyeh Ehsani Oskouie, and Farzan Farnia
Abstract: Interpreting neural network classifiers using gradient-based saliency maps has been extensively studied in the deep learning literature. While the existing algorithms manage to achieve satisfactory performance in application to standard image recognition datasets, recent works demonstrate the vulnerability of widely-used gradient-based interpretation schemes to norm-bounded perturbations adversarially designed for every individual input sample. However, such adversarial perturbations are commonly designed using the knowledge of an input sample, and hence perform sub-optimally in application to an unknown or constantly changing data point. In this paper, we show the existence of a Universal Perturbation for Interpretation (UPI) for standard image datasets, which can alter a gradient-based feature map of neural networks over a significant fraction of test samples. To design such a UPI, we propose a gradient-based optimization method as well as a principal component analysis (PCA)-based approach to compute a UPI which can effectively alter a neural network's gradient-based interpretation on different samples. We support the proposed UPI approaches by presenting several numerical results of their successful applications to standard image datasets.
Collaboration on a paper: Back to the Future: Toward a Hybrid Architecture for Ad Hoc Teamwork (AAAI 2023)
Authors: Hasra Dodampegama, Mohan Sridharan
Abstract: State of the art methods for ad hoc teamwork, i.e., for collaboration without prior coordination, often use a long history of prior observations to model the behavior of other agents (or agent types) and to determine the ad hoc agent's behavior. In many practical domains, it is difficult to obtain large training datasets, and necessary to quickly revise the existing models to account for changes in team composition or domain attributes. Our architecture builds on the principles of step-wise refinement and ecological rationality to enable an ad hoc agent to perform non-monotonic logical reasoning with prior commonsense domain knowledge and models learned rapidly from limited examples to predict the behavior of other agents. In the simulated multiagent collaboration domain Fort Attack, we experimentally demonstrate that our architecture enables an ad hoc agent to adapt to changes in the behavior of other agents, and provides enhanced transparency and better performance than a state of the art data-driven baseline.
I helped in exploring different heuristic methods implemented in Python.
Collaboration on a paper: The Psychological Effects of the Home Environment during Self-Quarantine: a Web-based Cross-Sectional Survey in Iran (IJAUP 2023)
Authors: Jamal-E-Din Mahdi Nejad, Hamidreza Azemati, Seyede Fereshteh Ehsani Oskouei, and Zinat Aminifar
Abstract: During the COVID-19 outbreak in Iran, self-quarantine was a measure to slow the spread of this infection. We conducted this cross-sectional study to explore the psychological effects of the home environment while people had to stay at home for a long time. For the survey, 536 individuals took part. Collecting data was via an online questionnaire including three sections: (1) Demographic characteristics and general information; (2) Home environment features and (3) Negative psychological experiences (NPE) considered as (a) feeling of sadness and depression; (b) feeling of stress and anxiety; and, (c) experiencing domestic violence during quarantine. For data analysis, first, some descriptive information about the participants was presented; then, we used a logistic regression model, one of the classification algorithms in machine learning methods to investigate the association of home environment features and NPE during self-quarantine. The results indicate the home environment affects NPE differently among men and women. Generally, the individuals who were more satisfied with their house performance during quarantine, and people considered the light quality of their house as appropriate; besides, residents with less noise disturbance issues had a better mood during this period. Conversely, failure in the possibility of indoor exercising and the feeling of being in a crowded house increased the level of NPE.
I helped with implementing the methods in Python (using Scikit-learn and Pandas).
Experience:
Gen AI Contributor at Scale AI + Outlier
May 2025 - March 2026
Description: Dataset generation, data quality improvements, performance evaluation on LLMs, with a collaboration on phase 4 of AIRA on the Outlier platform and AI Scientist Research.
Machine Learning Expert (AI Trainer) at Handshake AI + OpenAI
July 2025 - September 2025
Description: Developing and answering machine learning prompts, and evaluating responses from LLMs.
Research Intern at Bright Flourishing Health
June 2025 - August 2025
Description: Performing pose estimation through video processing and frequency domain denoising with the aim of exercise posture correction.
Research Assistant and Instructor at the University of California, Los Angeles
September 2023 - Present
Supervisor: Prof. Majid Sarrafzadeh
Junior Research Assistant at the Chinese University of Hong Kong & Imperial College London
July 2023 - September 2023
Supervisor: Prof. Farzan Farnia and Prof. Seyed Mohsen Moosavi-Dezfooli
Description: In this research, we aimed to discover a universal adversarial perturbation for the interpretation (UPI) of neural networks while ensuring that the perturbations do not alter the classification decisions of the models.
Undergraduate Research Assistant at the University of Birmingham
March 2022 - April 2023
Supervisor: Prof. Mohan Sridharan
Description: Our goal in this research was optimizing an ad hoc teamwork (AHT) problem in the FortAttack domain. An important observation in the ad hoc teamwork (AHT) domain is that deep neural networks have not demonstrated a significant advantage over simple machine learning approaches. For this reason, we employed Multinomial Logistic Regression with the STEW loss function. This approach allowed us to effectively model and improve the performance of the AHT system in the FortAttack domain.
Intern at Iran Telecommunication Research Center (ITRC)
Computer vision researcher at AI labs, October 2022 - February 2023
Supervisor: Prof. Mohammad-Shahram Moin
Description: Worked on Liveness Detection, Biometrics, and Face Detection using Optical Flow.
Bachelor Thesis
October 2021 - February 2023
Supervisor: Prof. Shohreh Kasaei
Description: In this project, I studied the effects of attention layers and deformable convolutions on optical flow estimation. I explored different loss functions and investigated the robustness of scene flow estimation networks against adversarial attacks.
Summer Intern at the Chinese University of Hong Kong
July 2022 - October 2022
Supervisor: Prof. Farzan Farnia
Description: In this work, we focused on finding a universal adversarial perturbation for the interpretation (UPI) of neural networks. To design such a UPI, we propose a gradient-based optimization method as well as a principal component analysis (PCA)-based approach to compute a UPI which can effectively alter a neural network's gradient-based interpretation on different samples.
Undergraduate Research Assistant at Sharif University of Technology
March 2021 - December 2021
Supervisor: Hamid R. Rabiee
Description: In this project, we focused on cancer detection by employing representation learning and semantic segmentation techniques on CennaLab's dataset. we explored different models, including ResNet-50 for supervised learning and SwAV for unsupervised learning, to improve the accuracy of the detection system. For this purpose, we leveraged the strengths of SwAV by integrating it as the encoder component of the U-Net architecture, aiming to enhance the efficiency and performance of the segmentation task. By combining representation learning and semantic segmentation approaches, we aimed to develop an effective and efficient cancer detection system that can contribute to advancements in medical imaging and diagnosis.