Human-Centric and Multimodal Evaluation for Explainable AI: Moving Beyond Benchmarks

Tutorial in the 34th International Joint Conference on Artificial Intelligence

Abstract

This tutorial addresses the limitations of static, machine-to-machine evaluation metrics by introducing a human-centric and cognition-driven paradigm. We propose the 3E Framework (Environment–Evaluation–Executor) to systematically assess AI systems not just by performance, but by their alignment with human reasoning, perception, and adaptability.

In the Individual Alignment part, we focus on Dynamic Visual Ability (DVA) and model human-like visual intelligence through Visual Turing Tests and eye-tracking experiments. By benchmarking AI systems against human cognitive patterns in tasks like global visual tracking and multimodal reasoning, we uncover alignment challenges and promote personalized modeling strategies.

In the Group Alignment section, we extend the evaluation to multi-agent and multi-user settings. Through benchmarks like FIOVA, we evaluate the fairness, diversity adaptation, and interpretability of AI systems in long video understanding and social computing contexts. This shift from individual to group-level evaluation ensures AI aligns not just with a single user, but with the diversity of human cognition.

We further showcase practical applications in AI for Education, where multimodal interaction data and reflective learning outcomes are jointly analyzed to evaluate AI teaching agents and adaptive learning platforms.

This tutorial provides actionable insights for researchers and practitioners aiming to build trustworthy, explainable, and human-aligned AI systems, bridging theoretical innovation with real-world deployment.

Course Description

This tutorial introduces a human-aligned and diversity-adaptive evaluation paradigm for trustworthy AI, moving beyond static benchmarks like accuracy or reward signals. We focus on bridging individual-level cognitive modeling and group-level diversity adaptation, offering a generalizable methodology across vision, education, and social computing tasks.

In Part 1: Individual Alignment, we focus on visual intelligence as a foundation for evaluating how well AI systems match human cognition. Using Dynamic Visual Ability (DVA), Visual Turing Tests, and eye-tracking experiments, we benchmark perception, memory, and reasoning capabilities across increasingly complex tasks: from short-term tracking (STT) to long-term reasoning (GIT, MGIT). This progression models human-like visual intelligence in structured and measurable ways.

In Part 2: Group Alignment, we shift to evaluating AI across diverse populations and complex social scenarios. While multi-annotator benchmarks like FIOVA provide a starting point for assessing alignment with diverse human perspectives in video understanding, this section expands further into social computing and education contexts, including multi-agent simulations, LLM-powered generative agents, and personality modeling. We examine how AI systems adapt to group variability, fairness requirements, and evolving collective behavior, emphasizing diversity, inclusivity, and social reasoning as central evaluation goals.

In Conclusions, we summarize a full-spectrum evaluation strategy—from individual cognitive alignment to group-level diversity modeling—and outline future directions in capability-based, multimodal, and human-in-the-loop evaluation standardization. This serves as a foundation for building AI systems that are not only performant, but also explainable, socially aware, and ethically deployable.

The tutorial equips researchers and practitioners in XAI, computer vision, education, and social AI with actionable tools grounded in the 3E Framework (Environment–Evaluation–Executor) to design next-generation evaluation strategies for reliable, adaptable, and human-aligned AI.

Organizers and Presenters

📧kanghao.cheong@ntu.edu.sg

Prof. Kang Hao Cheong

Associate Professor, School of Physical and Mathematical Sciences, Nanyang Technological University with a joint appointment with AI, College of Computing and Data Science (CCDS), Nanyang Technological University. Dr Cheong is on the World’s Top 2% Scientists – study by Stanford University/Elsevier (for both career-long and single-year, ranked 0.5% in Artificial Intelligence & Image Processing category). His research interests include AI in medical/healthcare, complexity science, evolutionary computation, and network science. He has published in the following journals such as PNAS, PRL, Nature Communications, Advanced Materials, IEEE TEVC, IEEE TCYB, IEEE TSMC, IEEE TFS, TNNLS, etc. He is currently serving on the Editorial Board of Frontiers in Human Neuroscience, Journal of Computational Science, Games and Scientific Reports. He has served as a reviewer for more than 20 journals, including Nature Communications, Nature Machine Intelligence, IEEE TPAMI, IEEE TAC, IEEE TCYB and Physical Review (PR) journals.

📧shiyu.hu@ntu.edu.sg

Dr. Shiyu Hu

Research Fellow at the School of Physical and Mathematical Sciences, Nanyang Technological University. Her research focuses on computer vision, large language models, and multi-modal learning. She has published over 20 papers in top-tier journals and conferences, such as TPAMI, IJCV, and NeurIPS, and received the Best Paper Honorable Mention at the CVPR VDU Workshop. She has developed widely used open-source platforms, including VideoCube, SOTVerse, and BioDrone, which have gained global recognition from 130+ countries and regions. Additionally, she has delivered tutorials at AI-related conferences, such as ICIP, ICPR, and ACCV, and authored an English monograph on artificial intelligence. She also serves as a reviewer and program committee member for CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, and TIP.

📧jie.zhao@ntu.edu.sg

Dr. Jie Zhao

Research Fellow, School of Physical and Mathematical Sciences, Nanyang Technological University. His research interests include network science, computational intelligence, and evolutionary computation and has more than 20 papers published in top-tier journals like IEEE TEVC, IEEE TSMC, IEEE TFS, etc.

📧yongbao.wu@ntu.edu.sg

Dr. Yongbao Wu

Research Fellow, School of Physical and Mathematical Sciences, Nanyang Technological University. His current research interests include stability theory for stochastic differential equations, networked control systems, and network system attacks. He has published over 30 research papers in top international journals and conferences, such as Automatica, IEEE TSMC, IEEE TCNS, IEEE TFS, etc.

Schedule

14:00-15:30, 18th August, 2025.

Room 519A, Palais des congrès, Montreal.

Anticipated Target Audience

This tutorial is designed for researchers, graduate students, and industry professionals working in artificial intelligence, particularly those interested in evaluation methodologies, computer vision, multimodal systems, and AI for Education. Participants should have a foundational understanding of machine learning principles and be familiar with commonly used evaluation metrics such as accuracy, reward scores, and precision-recall. Prior exposure to topics such as human-in-the-loop evaluation, Visual Turing Tests, or multi-agent systems is helpful but not required. The tutorial aims to provide both conceptual depth and practical guidance, making it suitable for attendees from both academic and applied research communities.