Papers

Award Winners

Accepted Papers (Available on OpenReview)

Don't Retrieve, Generate: Prompting LLMs for Synthetic Training Data in Dense Retrieval Aarush Sinha (Poster)

Query Timing Produces Opposite Positional Biases Between LLMs and Humans Jasin Cekinmez, Addison J. Wu, Thomas L. Griffiths (Poster)

Spatial Reasoning is Not a Free Lunch: A Controlled Study on LLaVA Nahid Alam, Leema Krishna Murali, Siddhant Bharadwaj, Patrick Liu, Timothy Chung, Drishti Sharma, Akshata, Kranthi Kiran GV, Wesley Tam, Bala Krishna S Vegesna (Poster)

The $\Psi$ Paradox in Extreme Superposition: When ETF Alignment Does Not Predict Language Model Generalization Hyunjun Kim

WHEN STABILITY FAILS: HIDDEN FAILURE MODES OF LLMS IN DATA-CONSTRAINED SCIENTIFIC DECISION-MAKING Nazia Riasat (Poster)

Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure Viliana Devbunova (Poster)

The Limits of Long-Context Reasoning in Automated Bug Fixing Ravi Shanker Raju, Mengmeng Ji, Shubhangi Upasani, Bo Li, Urmish Thakker (Poster)

Evaluating Ill-Defined Tasks in Large Language Models Yi Zhou, Basel Shbita (Poster)

Probing and Steering Chain-of-Thought Unfaithfulness in Language Models Giovanni Maria Occhipinti, Alessandro Abate, Nandi Schoots (Poster)

Style over Substance: LLM-as-a-Judge Fails to Evaluate Multi-Party Social Dialogue Kunal Samanta, Faisal Tareque Shohan, Amine Trabelsi, Richard Khoury

Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG Martin Asenov, Kenza Benkirane, Daniel Goldwater, Aneiss Ghodsi (Poster)

The Continuous Space Gap: Why VLMs Fail in Continuous Geometric Reasoning Yikun Zong, Cheston Tan (Poster)

Not All Time Is Gregorian: Evaluating LLMs on Cultural Calendar Systems Deepon Halder, Adish Pandya, Raj Dabre

Lost in Translation: Why SOTA LLMs Struggle with French NLU Frontiers David Beauchemin, Yan Tremblay, Mohamed Amine Youssef, Richard Khoury

Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs Aditya Sinha, Harald Steck, Vito Claudio Ostuni, Matteo Rinaldi (Poster)

EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation Shih-Yang Liu, Maksim Khadkevich, Nai Chit FUNG, Charbel Sakr, Chao-Han Huck Yang, Chien-Yi Wang, Saurav Muralidharan, Hongxu Yin, Kwang-Ting Cheng, Jan Kautz, Yu-Chiang Frank Wang, Pavlo Molchanov, Min-Hung Chen (Poster)

Knowing Is Not Seeing. Limits of Physical Problem Solving in VLMs Karim Elmaaroufi, Kevin Chon, Justin Svegliato, Lakshya A Agrawal, Matei Zaharia, Sanjit A. Seshia

Improving Proxy Transfer via Intermediate Proxy Tuning Kevin Kuo, Ayush Sehgal, Robert Pare, Virginia Smith

When can you TRUST Large Language Models? Radu Paradovschi, Darvin Yi, Andrew Rabinovich, Zhao Chen (Poster)

One Step Forward, Two Steps Back: Regression Errors and Cost Inefficiencies in LLM Iterative Refinement for Code Generation Lucas Teixeira Borges, RICARDO RIOS (Poster)

Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance? Apratim Bhattacharyya, Bicheng Xu, Sanjay Haresh, Reza Pourreza, Litian Liu, Sunny Panchal, Pulkit Madan, Leonid Sigal, Roland Memisevic (Poster)

NON-MONOTONICITY AND CATASTROPHIC RISK OF PROMPT INTERVENTIONS IN ADVERSARIAL LLM CONTROL Koki Inoue, Naoya Takashima, Hayato Fujihara, SHUYA HIGUCHI, Kota Shimomura, Ryuta Shimogauchi, Takayoshi Yamashita (Poster)

The Missing Red Line: How Commercial Pressure Erodes AI Safety Boundaries Nora Petrova, John Burden (Poster)

Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models Hicham Eddoubi, Umar Faruk Abdullahi, Fadi Hassan (Poster)

EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages Aman Sharma, Paras Chopra

Random Is Hard to Beat: Active Selection in Online DPO with Modern LLMs Giyeong Oh, Junghyun Lee, Jaehyun Park, Youngjae Yu, Wonho Bae, Junhyug Noh (Poster)

A Pilot Study on Doubt Robustness of LLMs in Clinical Prediction Explanation Juhwan Choi, Sangchul Hahn, Eunho Yang

I Can't Believe It's Not Robust: Catastrophic Collapse of Safety Classifiers under Embedding Drift Subramanyam Sahoo, Vinija Jain, Divya Chaudhary, Aman Chadha

Limits of Difficulty Scaling: Hard Samples Yield Diminishing Returns in GRPO-Tuned SLMs Suraj Yadav, Siddharth Yadav, Parth Goyal (Poster)

AI-rithmetic Alex Bie, Travis Dick, Alex Kulesza, Prabhakar Raghavan, Vinod Raman, Sergei Vassilvitskii (Poster)

Challenges in Inference-Time Scaling with Uncertainty-Aware Tree Search Jacopo Minniti, Neil Band, Tim G. J. Rudner (Poster)

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan (Poster)

The Cost of Consistency: Why Cross-Plane Contrastive Learning Fails to Bridge the Gap Between MedSAM-3 and nnU-Net Madhu Shree Aravindan, Aaditi V Bajpai, Ramamoorthy Sriramulu (Poster)

Why Large Language Models Fail for Hausa Educational Content: Cascading Errors from Translation to Speech to Comprehension Honour-Jesus Bezaleel, Pearse Jim, Moses Daudu (Poster)

Barriers to Pareto Steerability in Preference-Conditioned LLM Alignment Fatemeh Nourzad, Daouda Sow, Yingbin Liang, Ming Shi, Ming Zhang, Yunxuan Li, Eylem Ekici, Ness Shroff (Poster)

Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models Jakub Binkowski, Kamil Adamczewski, Tomasz Jan Kajdanowicz (Poster)

Learning State-Tracking from Code: REPL Traces and Probabilistic Automata Julien Siems, Riccardo Grazzi, Kirill Kalinin, Hitesh Ballani, Babak Rahmani

The Selective Safety Trap: How LLMs Scaling and Alignment Fail to Generalize Across Minority Demographics Iago Alves Brito, Walcy Rios, Julia Soares Dollis, Diogo Fernandes Costa Silva, Arlindo Rodrigues Galvão Filho (Poster)

When Rubrics Backfire: Systematic Preference Drift in LLM Judges Ruomeng Ding, Yifei Pang, He Sun, Yizhong Wang, Steven Wu, Zhun Deng (Poster)

Synthetic Error Injection Fails to Elicit Self-Correction In Language Models David Xing Wu, Shreyas Kapur, Anant Sahai, Stuart Russell

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Maggie Ziyu Huan, Yuetai Li, Tianyu Zheng, Xiaoyu Xu, Seungone Kim, Minxin Du, Radha Poovendran, Graham Neubig, Xiang Yue

Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap Yueqian Lin, Zhengmian Hu, Qinsi Wang, Yudong Liu, Hengfan Zhang, Jayakumar Subramanian, Nikos Vlassis, Hai Helen Li, Yiran Chen (Poster)

Bigger Is Not Better Under Differential Privacy: Optimization Failure at Eleven-Billion Scale in Vision–Language Model Fine-Tuning Tzuen Su, Li-Hong Guo, Yangmi Su, Cheng-Yen Li (Poster)

Evaluation-Conditioned Trojan Attack Zihan Zhu, Hanlin Zhang, Giovanni D'Antonio, Anton Tsitsulin, Sham M. Kakade, Vahab Mirrokni

FLUFFINJECTOR: DIAGNOSING LOGICAL CONSISTENCY FAILURES IN CHAIN-OF-THOUGHT REWARD MODELS Varshith Vijjapu, Krishiv Ray, Archana Vaidheeswaran

I Can’t Believe It’s Not Safer: Preference–Safety Disassociation in Clinical LLM Evaluation Fay Elhassan, David Sasu, Lars Henning Klein, Alexandra V. Kulinkina, Mary-Anne Hartley (Poster)

I Can't Believe It Can't Count: Vision-Language Models Fail at Basic Enumeration Beyond the Subitizing Range Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall

The Anatomy of Uncertainty in LLMs Aditya Taparia, Ransalu Senanayake, Kowshik Thopalli, Vivek Narayanaswamy (Poster)

More Than a Quick Glance: Overcoming the Greedy Bias in KV-Cache Compression Aryan Sood, Tanvi Sharma, Vansh Agrawal

Language-Dependent Miscalibration in Multilingual LLM Evaluators Ej Zhou, Lucas Resck, Zheng Hui, Anna Korhonen (Poster)

Fairness Failure Modes of Multimodal LLMs Canyu Chen, Anglin Cai, Joan Nwatu, Yale Li, Han Liu, Jessica Hullman, Rada Mihalcea, Kathleen McKeown, Manling Li

I Can't Believe LLMs Still Can't Write Drama: Multi-Dimensional Failures in Script Continuation Shijian Ma, Yunqi Huang, Lin Yan (Poster)

The Low-Frequency Trap: Why Scaling Doesn't Solve Simple Temporal Counting Sarvesh Baskar, Muhammad R. Islam, Zikui Cai, Ankit Nakhawa, Anirudh Satheesh, Tom Goldstein, Furong Huang (Poster)

QuanBench Plus: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation Ali Slim, Haydar Hamieh, Jawad Kotaich, Yehya Ghosn, Mahdi Chehimi, Hasan Abed Al Kader Hammoud, Ammar Mohanna, Bernard Ghanem (Poster)

Can LLMs Perceive Time? An Empirical Investigation Aniketh Garikaparthi (Poster)

When Lie Detectors Learn Model Identity: Confounds in Black-Box Sandbagging Detection Lin Yulong, Pablo Bernabeu-Perez, Benjamin Arnav, Lennie Wells, Mary Phuong

Page updated

Google Sites

Report abuse