Accepted Papers

This workshop does not require camera-ready papers to be submitted to Underline.io, considering the possibility of resubmission to other conferences. If you wish to make your paper publicly available to conference participants or the general public, please consider using paper submission services such as arXiv. If you contact the organizers, we can also add a link to your arXiv submission in the papers section of this page.

Poster allocation

[AAAI26_W8_1] ViG-LLM: Enhancing Visual Grounding Capabilities in Closed-Box LLMs for Document Information Extraction without OCR Dependencies

Sudhanshu Bhoi

[AAAI26_W8_2] CausalFusion: Integrating LLMs and Graph Falsification for Causal Discovery

Alessandro Casadei, Sreyoshi Bhaduri, Pavan Nithin Mullapudi, Rohit Malshe

[AAAI26_W8_3] Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning

Owais Kazi, Shreyas Subramanian, Polaris Singh Jhandi, Neel Sendas

[AAAI26_W8_4] CAD Inspection Assistant: Tool-Augmented Agentic CAD Inspection Solution

Fangjun Wang, Xidan Zhang, Jianing Wei, Nan Zhang, Yunqing Liu, Zhiming Tan

[AAAI26_W8_5] ECHO: EvidenCe-prior Hallucination Observation

Ziqiang Shi, Liu Liu, Zihao Guo, Fei Li, Rujie Liu, Shanshan Yu, Satoshi Munakata, Koichi Shirahata

[AAAI26_W8_6 ] BAID: A Benchmark for Bias Assessment of AI Detectors

Priyam Basu, Yunfeng Zhang, Vipul Raheja

[AAAI26_W8_7] MeetBench-XL: Calibrated Multi-Dimensional Evaluation and Learned Dual-Policy Agents for Real-Time Meetings

YUELIN HU, jun xu, Bingcong Lu, Zhengxue Cheng, Hongwei Hu, Ronghua Wu, Li Song

[AAAI26_W8_8] Overcoming the ‘Impracticality’ of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework

Kenichirou Narita, Siqi Peng, Taku Fukui, Moyuru Yamada, Satoshi Munakata, Satoru Takahashi

[AAAI26_W8_9] Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks

Gianni Molinari, FABIO CIRAVEGNA

[AAAI26_W8_10] Agentic Observability: Automated Alert Triage for Adobe E-Commerce

Aprameya Bharadwaj, Kyle Tu

[AAAI26_W8_11] Beyond Curated Benchmarking: Automated Evaluation of LLM Agents for Safe and Reliable IT Infrastructure Management

Gayathri Saranathan, Aalap Tripathy, Tarun Kumar, Scott Hinchley, Martin Foltin, Christopher L Holmes, David Brookshire, Donald M Bahls, Cong Xu, Robert W. Wisniewski, Larry Kaplan, Suparna Bhattacharya

[AAAI26_W8_12] Multi-Agent AI Trainer: Adaptive Skill Evaluation via Persona-Driven Examiners and Multi-Criteria Judging

Daniil Sukhorukov, Kirill Dzhunkovsky, Aleksandr Tsymbalov, Roman Kharkovskoy, Mikhail Mozikov, Nasonov Ivan, Nikita Glazkov, Vlad Kuznetsov, Maxim Dubovitsky, Ilya Makarov

[AAAI26_W8_13] Agentic Code Generation for Heuristic Rules in Equipment Monitoring

Fabio Lorenzi, Abigail Langbridge, Fearghal O'Donncha, James T Rayfield, Bradley Eck, Sal Rosato

[AAAI26_W8_14] POLARIS: Typed Planning and Governed Execution for Agentic AI in Back-Office Automation

Zahra Moslemi, KEERTHI KONERU, Sheethal Kumar, Yen-Ting Lee, Ramesh Radhakrishnan

[AAAI26_W8_15] Multi-Agent Coordination for Dynamic Supply Chain Resilience: A Benchmark and Evaluation

Bayron Jossue Serrano Mena

[AAAI26_W8_16] Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems

Sushant Mehta

[AAAI26_W8_17] Polaris : Multi Agentic System for Conversational Enterprise Analytics

Varuni H K, Soham Sarkar, Jay Kumar, Goutham Krishnan, Tanvi Johari, Santosh Hegde, Avinash Bharadwaj

[AAAI26_W8_18] Verification-Guided Context Optimization for Tool Calling via Hierarchical LLMs-as-editors

Henger Li, shuangjie you, Flavio Di Palo, Yiyue Qian, Ayush Jain

[AAAI26_W8_19] The Forecast Critic: Leveraging Large Language Models for Poor Forecast Identification

Luke Bhan, Hanyu Zhang, Andrew Gordon Wilson, Michael W. Mahoney, Chuck Arvin

[AAAI26_W8_20] MFCL Vision: Benchmarking Tool Use in Multimodal Large Language Models for Visual Reasoning Tasks

Huanzhi Mao, Jad Bendarkawi, Evan Maxwell Turner, Ritesh Sunil Chavan

[AAAI26_W8_21] VLM-guided Object-level Segmentation from Dynamic Scene

Feiran Yang

[AAAI26_W8_22] Auditing Generative AI Benchmarks with a Multi-Agent Compliance System

Ananya Joshi, Michael Rudow

[AAAI26_W8_23] Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics

Akshara Prabhakar, Roshan Ram, Zixiang Chen, Silvio Savarese, Frank Wang, Caiming Xiong, Huan Wang, Weiran Yao

[AAAI26_W8_24] Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering

Nathan Mao, Varun Kaushik, Shreya Shivkumar, Parham Sharafoleslami, Kevin Zhu, Sunishchal Dev

[AAAI26_W8_25] Realistic Synthetic Household Data Generation at Scale

Siddharth Singh, Ifrah Idrees, Abraham Dauhajre

[AAAI26_W8_26] Scalable Strategies for Agentic-AI to Handle Long-Tail Enterprise Use Cases

Badri Nath

[AAAI26_W8_27] Grounding Enterprise Data for Agentic AI: A Semantic Approach to Vertical Data Lineage using Small Language Models

Shivansh Tuteja, Jatin Bedi

[AAAI26_W8_28] FusionMind A Differentiable and Efficient Multi-Modal Retrieval-Augmented Generation Framework

liu junshen, Ren Beiming, Zheng Rui, Liu Yin

[AAAI26_W8_29] LENS: Learning Architecture Navigator for LLM Agentic Systems

Guancheng Wan, Jiayi Yang, Mengting Li

[AAAI26_W8_30] RAAS: Relative Architecture Adaptive Search for Agentic Supernet Optimization

Jiayi Yang, Guancheng Wan, Mengting Li

[AAAI26_W8_31] Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts

Guancheng Wan, Leixin Sun, Mengting Li

[AAAI26_W8_32] Benchmarking Agents in Insurance Underwriting Environments

Amanda Dsouza, Ramya Ramakrishnan, Charles Andrew Dickens, Bhavishya Pohani, Christopher M Glaze

[AAAI26_W8_33] Enabling Reliable Enterprise Agentic AI: A Case Study on Domain-Specific Embeddings and Benchmarking in Offshore Energy

Sampath Rajapaksha, Nirmalie Wiratunga, Ikechukwu Nkisi-Orji, Tim Clarke, Fraser Kerr

Page updated

Google Sites

Report abuse