[AAAI26_W8_1] ViG-LLM: Enhancing Visual Grounding Capabilities in Closed-Box LLMs for Document Information Extraction without OCR Dependencies
Sudhanshu Bhoi
[AAAI26_W8_2] CausalFusion: Integrating LLMs and Graph Falsification for Causal Discovery
Alessandro Casadei, Sreyoshi Bhaduri, Pavan Nithin Mullapudi, Rohit Malshe
[AAAI26_W8_3] Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning
Owais Kazi, Shreyas Subramanian, Polaris Singh Jhandi, Neel Sendas
[AAAI26_W8_4] CAD Inspection Assistant: Tool-Augmented Agentic CAD Inspection Solution
Fangjun Wang, Xidan Zhang, Jianing Wei, Nan Zhang, Yunqing Liu, Zhiming Tan
[AAAI26_W8_5] ECHO: EvidenCe-prior Hallucination Observation
Ziqiang Shi, Liu Liu, Zihao Guo, Fei Li, Rujie Liu, Shanshan Yu, Satoshi Munakata, Koichi Shirahata
[AAAI26_W8_6 ] BAID: A Benchmark for Bias Assessment of AI Detectors
Priyam Basu
[AAAI26_W8_7] MeetBench-XL: Calibrated Multi-Dimensional Evaluation and Learned Dual-Policy Agents for Real-Time Meetings
YUELIN HU, jun xu, Bingcong Lu, Zhengxue Cheng, Hongwei Hu, Ronghua Wu, Li Song
[AAAI26_W8_8] Overcoming the ‘Impracticality’ of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework
Kenichirou Narita, Siqi Peng, Taku Fukui, Moyuru Yamada, Satoshi Munakata, Satoru Takahashi
[AAAI26_W8_9] Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks
Gianni Molinari, FABIO CIRAVEGNA
[AAAI26_W8_10] Agentic Observability: Automated Alert Triage for Adobe E-Commerce
Aprameya Bharadwaj, Kyle Tu
[AAAI26_W8_11] Beyond Curated Benchmarking: Automated Evaluation of LLM Agents for Safe and Reliable IT Infrastructure Management
Gayathri Saranathan, Aalap Tripathy, Tarun Kumar, Scott Hinchley, Martin Foltin, Christopher L Holmes, David Brookshire, Donald M Bahls, Cong Xu, Robert W. Wisniewski, Larry Kaplan, Suparna Bhattacharya
[AAAI26_W8_12] Multi-Agent AI Trainer: Adaptive Skill Evaluation via Persona-Driven Examiners and Multi-Criteria Judging
Daniil Sukhorukov, Kirill Dzhunkovsky, Aleksandr Tsymbalov, Roman Kharkovskoy, Mikhail Mozikov, Nasonov Ivan, Nikita Glazkov, Vlad Kuznetsov, Maxim Dubovitsky, Ilya Makarov
[AAAI26_W8_13] Agentic Code Generation for Heuristic Rules in Equipment Monitoring
Fabio Lorenzi, Abigail Langbridge, Fearghal O'Donncha, James T Rayfield, Bradley Eck, Sal Rosato
[AAAI26_W8_14] POLARIS: Typed Planning and Governed Execution for Agentic AI in Back-Office Automation
Zahra Moslemi, KEERTHI KONERU, Sheethal Kumar, Yen-Ting Lee, Ramesh Radhakrishnan
[AAAI26_W8_15] Multi-Agent Coordination for Dynamic Supply Chain Resilience: A Benchmark and Evaluation
Bayron Jossue Serrano Mena
[AAAI26_W8_16] Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems
Sushant Mehta
[AAAI26_W8_17] Polaris : Multi Agentic System for Conversational Enterprise Analytics
Varuni H K, Soham Sarkar, Jay Kumar, Goutham Krishnan, Tanvi Johari, Santosh Hegde, Avinash Bharadwaj
[AAAI26_W8_18] Verification-Guided Context Optimization for Tool Calling via Hierarchical LLMs-as-editors
Henger Li, shuangjie you, Flavio Di Palo, Yiyue Qian, Ayush Jain
[AAAI26_W8_19] The Forecast Critic: Leveraging Large Language Models for Poor Forecast Identification
Luke Bhan, Hanyu Zhang, Andrew Gordon Wilson, Michael W. Mahoney, Chuck Arvin
[AAAI26_W8_20] MFCL Vision: Benchmarking Tool Use in Multimodal Large Language Models for Visual Reasoning Tasks
Huanzhi Mao, Jad Bendarkawi, Evan Maxwell Turner, Ritesh Sunil Chavan
[AAAI26_W8_21] VLM-guided Object-level Segmentation from Dynamic Scene
Feiran Yang
[AAAI26_W8_22] Auditing Generative AI Benchmarks with a Multi-Agent Compliance System
Ananya Joshi, Michael Rudow
[AAAI26_W8_23] Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics
Akshara Prabhakar, Roshan Ram, Zixiang Chen, Silvio Savarese, Frank Wang, Caiming Xiong, Huan Wang, Weiran Yao
[AAAI26_W8_24] Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
Nathan Mao, Varun Kaushik, Shreya Shivkumar, Parham Sharafoleslami, Kevin Zhu, Sunishchal Dev
[AAAI26_W8_25] Realistic Synthetic Household Data Generation at Scale
Siddharth Singh, Ifrah Idrees, Abraham Dauhajre
[AAAI26_W8_26] Scalable Strategies for Agentic-AI to Handle Long-Tail Enterprise Use Cases
Badri Nath
[AAAI26_W8_27] Grounding Enterprise Data for Agentic AI: A Semantic Approach to Vertical Data Lineage using Small Language Models
Shivansh Tuteja, Jatin Bedi
[AAAI26_W8_28] FusionMind A Differentiable and Efficient Multi-Modal Retrieval-Augmented Generation Framework
liu junshen, Ren Beiming, Zheng Rui, Liu Yin
[AAAI26_W8_29] LENS: Learning Architecture Navigator for LLM Agentic Systems
Guancheng Wan, Jiayi Yang, Mengting Li
[AAAI26_W8_30] RAAS: Relative Architecture Adaptive Search for Agentic Supernet Optimization
Jiayi Yang, Guancheng Wan, Mengting Li
[AAAI26_W8_31] Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts
Guancheng Wan, Leixin Sun, Mengting Li
[AAAI26_W8_32] Benchmarking Agents in Insurance Underwriting Environments
Amanda Dsouza, Ramya Ramakrishnan, Charles Andrew Dickens, Bhavishya Pohani, Christopher M Glaze
[AAAI26_W8_33] Enabling Reliable Enterprise Agentic AI: A Case Study on Domain-Specific Embeddings and Benchmarking in Offshore Energy
Sampath Rajapaksha, Nirmalie Wiratunga, Ikechukwu Nkisi-Orji, Tim Clarke, Fraser Kerr
[AAAI26_W8_34] IDBench: Evaluating and Mitigating Intent Drift in Agentic AI Workflows
Jianming Lai