This workshop does not require camera-ready papers to be submitted to Underline.io, considering the possibility of resubmission to other conferences. If you wish to make your paper publicly available to conference participants or the general public, please consider using paper submission services such as arXiv. If you contact the organizers, we can also add a link to your arXiv submission in the papers section of this page.
[AAAI26_W8_1] ViG-LLM: Enhancing Visual Grounding Capabilities in Closed-Box LLMs for Document Information Extraction without OCR Dependencies
Sudhanshu Bhoi
[AAAI26_W8_2] CausalFusion: Integrating LLMs and Graph Falsification for Causal Discovery
Alessandro Casadei, Sreyoshi Bhaduri, Pavan Nithin Mullapudi, Rohit Malshe
[AAAI26_W8_3] Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning
Owais Kazi, Shreyas Subramanian, Polaris Singh Jhandi, Neel Sendas
[AAAI26_W8_4] CAD Inspection Assistant: Tool-Augmented Agentic CAD Inspection Solution
Fangjun Wang, Xidan Zhang, Jianing Wei, Nan Zhang, Yunqing Liu, Zhiming Tan
[AAAI26_W8_5] ECHO: EvidenCe-prior Hallucination Observation
Ziqiang Shi, Liu Liu, Zihao Guo, Fei Li, Rujie Liu, Shanshan Yu, Satoshi Munakata, Koichi Shirahata
[AAAI26_W8_6 ] BAID: A Benchmark for Bias Assessment of AI Detectors
Priyam Basu, Yunfeng Zhang, Vipul Raheja
[AAAI26_W8_7] MeetBench-XL: Calibrated Multi-Dimensional Evaluation and Learned Dual-Policy Agents for Real-Time Meetings
YUELIN HU, jun xu, Bingcong Lu, Zhengxue Cheng, Hongwei Hu, Ronghua Wu, Li Song
[AAAI26_W8_8] Overcoming the ‘Impracticality’ of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework
Kenichirou Narita, Siqi Peng, Taku Fukui, Moyuru Yamada, Satoshi Munakata, Satoru Takahashi
[AAAI26_W8_9] Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks
Gianni Molinari, FABIO CIRAVEGNA
[AAAI26_W8_10] Agentic Observability: Automated Alert Triage for Adobe E-Commerce
Aprameya Bharadwaj, Kyle Tu
[AAAI26_W8_11] Beyond Curated Benchmarking: Automated Evaluation of LLM Agents for Safe and Reliable IT Infrastructure Management
Gayathri Saranathan, Aalap Tripathy, Tarun Kumar, Scott Hinchley, Martin Foltin, Christopher L Holmes, David Brookshire, Donald M Bahls, Cong Xu, Robert W. Wisniewski, Larry Kaplan, Suparna Bhattacharya
[AAAI26_W8_12] Multi-Agent AI Trainer: Adaptive Skill Evaluation via Persona-Driven Examiners and Multi-Criteria Judging
Daniil Sukhorukov, Kirill Dzhunkovsky, Aleksandr Tsymbalov, Roman Kharkovskoy, Mikhail Mozikov, Nasonov Ivan, Nikita Glazkov, Vlad Kuznetsov, Maxim Dubovitsky, Ilya Makarov
[AAAI26_W8_13] Agentic Code Generation for Heuristic Rules in Equipment Monitoring
Fabio Lorenzi, Abigail Langbridge, Fearghal O'Donncha, James T Rayfield, Bradley Eck, Sal Rosato
[AAAI26_W8_14] POLARIS: Typed Planning and Governed Execution for Agentic AI in Back-Office Automation
Zahra Moslemi, KEERTHI KONERU, Sheethal Kumar, Yen-Ting Lee, Ramesh Radhakrishnan
[AAAI26_W8_15] Multi-Agent Coordination for Dynamic Supply Chain Resilience: A Benchmark and Evaluation
Bayron Jossue Serrano Mena
[AAAI26_W8_16] Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems
Sushant Mehta
[AAAI26_W8_17] Polaris : Multi Agentic System for Conversational Enterprise Analytics
Varuni H K, Soham Sarkar, Jay Kumar, Goutham Krishnan, Tanvi Johari, Santosh Hegde, Avinash Bharadwaj
[AAAI26_W8_18] Verification-Guided Context Optimization for Tool Calling via Hierarchical LLMs-as-editors
Henger Li, shuangjie you, Flavio Di Palo, Yiyue Qian, Ayush Jain
[AAAI26_W8_19] The Forecast Critic: Leveraging Large Language Models for Poor Forecast Identification
Luke Bhan, Hanyu Zhang, Andrew Gordon Wilson, Michael W. Mahoney, Chuck Arvin
[AAAI26_W8_20] MFCL Vision: Benchmarking Tool Use in Multimodal Large Language Models for Visual Reasoning Tasks
Huanzhi Mao, Jad Bendarkawi, Evan Maxwell Turner, Ritesh Sunil Chavan
[AAAI26_W8_21] VLM-guided Object-level Segmentation from Dynamic Scene
Feiran Yang
[AAAI26_W8_22] Auditing Generative AI Benchmarks with a Multi-Agent Compliance System
Ananya Joshi, Michael Rudow
[AAAI26_W8_23] Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics
Akshara Prabhakar, Roshan Ram, Zixiang Chen, Silvio Savarese, Frank Wang, Caiming Xiong, Huan Wang, Weiran Yao
[AAAI26_W8_24] Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
Nathan Mao, Varun Kaushik, Shreya Shivkumar, Parham Sharafoleslami, Kevin Zhu, Sunishchal Dev
[AAAI26_W8_25] Realistic Synthetic Household Data Generation at Scale
Siddharth Singh, Ifrah Idrees, Abraham Dauhajre
[AAAI26_W8_26] Scalable Strategies for Agentic-AI to Handle Long-Tail Enterprise Use Cases
Badri Nath
[AAAI26_W8_27] Grounding Enterprise Data for Agentic AI: A Semantic Approach to Vertical Data Lineage using Small Language Models
Shivansh Tuteja, Jatin Bedi
[AAAI26_W8_28] FusionMind A Differentiable and Efficient Multi-Modal Retrieval-Augmented Generation Framework
liu junshen, Ren Beiming, Zheng Rui, Liu Yin
[AAAI26_W8_29] LENS: Learning Architecture Navigator for LLM Agentic Systems
Guancheng Wan, Jiayi Yang, Mengting Li
[AAAI26_W8_30] RAAS: Relative Architecture Adaptive Search for Agentic Supernet Optimization
Jiayi Yang, Guancheng Wan, Mengting Li
[AAAI26_W8_31] Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts
Guancheng Wan, Leixin Sun, Mengting Li
[AAAI26_W8_32] Benchmarking Agents in Insurance Underwriting Environments
Amanda Dsouza, Ramya Ramakrishnan, Charles Andrew Dickens, Bhavishya Pohani, Christopher M Glaze
[AAAI26_W8_33] Enabling Reliable Enterprise Agentic AI: A Case Study on Domain-Specific Embeddings and Benchmarking in Offshore Energy
Sampath Rajapaksha, Nirmalie Wiratunga, Ikechukwu Nkisi-Orji, Tim Clarke, Fraser Kerr