The 5th Workshop on Ethical Artificial Intelligence: Methods and Applications
(held in conjunction with ACM SIGKDD 2026)
August 10, 2026, Jeju, Korea
The 5th Workshop on Ethical Artificial Intelligence: Methods and Applications
(held in conjunction with ACM SIGKDD 2026)
August 10, 2026, Jeju, Korea
Introduction
As computers increasingly make decisions about who gets a loan, a job, or even bail, the expansion of AI algorithms has provoked public concern about ethical issues, and the need to understand what constitutes AI algorithms and how they make decisions becomes ever more pressing. For example, an increasing number of high-profile news outlets have widely reported that widely-used algorithms have unfairly discriminated against some groups of people (e.g., by gender and race) in parole decisions and other major life events. Focusing more attention on ethical bias in learning algorithms is key to unlocking the potential of automated decision systems while ensuring fairness and accountability so that everyone can advance equally in society.
Ethical AI has become increasingly important, and it has been attracting attention from academia and industry due to its increased popularity in real-world applications with fairness concerns. It also places fundamental importance on ethical considerations in determining legitimate and illegitimate uses of AI. Organizations that apply ethical AI have clearly stated, well-defined review processes to ensure adherence to legal guidelines. Therefore, the wave of research at the intersection of ethical AI in data mining and machine learning has also influenced other fields of science, including computer vision, natural language processing, reinforcement learning, and social science.
Call for Papers
Important Dates
The following are the proposed important dates for the workshop. All deadlines are due 11:59 pm Pacific Time.
Paper Submission: April 30th, 2026 May 15th, 2026
Paper Notification: June 4th, 2026
Workshop Date: August 10th, 2026, Afternoon
Topics of Interest
We encourage submissions in various degrees of progress, such as new results, visions, techniques, innovative application papers, and progress reports under the topics that include, but are not limited to, the following broad categories:
Algorithmic fairness and bias in classifying and clustering big data
Human-in-the-loop for ethical-aware machine learning
Ethical recommender systems and diversity in recommendations
Learning an ethical-aware representation on heterogeneous data domains
Causality-based fairness in high-dimensional data
Integration of observation for causality-based bias control
Preserving fairness in graph embedding
Novel visualization techniques to facilitate the query and analysis of data bias
Robustness and generalization of LLMs
Bias mitigation and the fairness of LLMs
Explainability, interpretability, privacy, and security of LLMs
First-hand experience creating or working with company practices for ethical AI
Ethical considerations in high-performance computing (HPC)
Philosophical theories and their implications for AI ethics
And with a particular focus, but not limited to, these application domains:
Application of ethical AI methods in large-scale data mining
Computer vision (fairness in face recognition, object relation, debiasing in image processing, and video)
Natural language processing (fair text generation, semantic parsing)
Reinforcement learning (fairness-aware multi-agent learning, compositional imitation learning)
Social science (racial profiling, institutional racism)
Submission Guidelines
Submissions are limited to a total of 5 pages, including all content and references. There will be no page limit for supplemental materials. All submissions must be in PDF format and use ACM Conference Proceeding templates (two-column format). One recommended setting for a Latex file of an anonymous manuscript is: \documentclass[sigconf, anonymous, review]{acmart}. Template guidelines are here: https://www.acm.org/publications/proceedings-template.
Following this KDD conference submission policy, reviews are double-blind, and author names and affiliations should NOT be listed. Submitted papers will be assessed based on their novelty, technical quality, potential impact, and clarity of writing. For papers that rely heavily on empirical evaluations, the experimental methods and results should be clear, well-executed, and repeatable. Authors are strongly encouraged to make data and code publicly available whenever possible.
Submit your paper through the EAI workshop CMT submission site: https://cmt3.research.microsoft.com/EAI2026
The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.
Paper Acceptance
Accepted workshop papers will be categorized as either poster or oral presentations. Both types will be posted on the workshop website, but will NOT be included in the official KDD proceedings.
Upon notification, we ask that authors of accepted works deanonymize their papers, make any final changes, and then submit a camera-ready version to the CMT submission site. The workshop website will then be updated with links to accepted papers. Note that accepted works will not be formally published. This means that:
Authors can retain full copyright of their works.
Work contained in accepted papers is not precluded from being published in other research venues.
Submitted papers are allowed to have significant overlap with previously published or currently submitted work (in this case, please indicate overlapping works).
The workshop chairs and committees will designate one Best Paper Award and one Runner-Up Paper Award for accepted oral papers. Additionally, the workshop organizers encourage all authors of accepted papers to extend their work and submit to the special issue in the Journal of Frontiers in Big Data: Ethical Artificial Intelligence: Methods and Applications.
Any questions may be directed to the email address: chen_zhao@baylor.edu
Attendence
For each accepted paper, at least one author must attend the conference and present the paper.
Keynote Speakers
Junhua Ding, University of North Texas, USA
Junhua Ding is the Reinburg Endowed Professor and Founding Chair of the Anuradha & Vikas Sinha Department of Data Science at the University of North Texas. Prior to returning to academia in 2007, he spent nearly eight years in industry as a software engineer and project manager at leading biomedical companies. He received his Ph.D. in Computer Science from Florida International University, his M.S. in Computer Science from Nanjing University, and his B.S. in Computer Science from China University of Geosciences.
His research spans data-centric AI, software engineering, biomedical computing, and quantum computing. His current work pursues three directions: integrating formal methods with large language models to improve the reliability of AI-enabled software systems; developing non-invasive blood pressure and glucose monitoring technologies using AI-guided physiological modeling; and applying formal verification to quantum software systems.
Dr. Ding has authored more than 140 peer-reviewed publications in leading venues including EMNLP, EACL, ICCV, WWW, AAAI, ICDM, and top journals in his fields. His research is supported by grants from NSF, DoD, and industry partners. He serves on the editorial boards of Information and Software Technology and Computer Standards & Interfaces, and as Co-General Chair of the 2026 ACM/IEEE Joint Conference on Digital Libraries (JCDL).
Jian Kang, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE
Jian Kang is an Assistant Professor in the Department of Statistics and Data Science and the Department of Computer Science at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). His research focuses on personalized AI for social good. He has published over 30 papers in top conferences and has received EMNLP 2025 SAC Highlights, Rising Stars in Data Science, and the Mavis Future Faculty Fellowship. He is the Associate Editor of ACM Computing Surveys and serves on the organization committee of multiple conferences (CIKM 2026, IEEE Big Data 2026, LoG 2025, CIKM 2025, KDD 2024).
Xinjian Luo, Shanghai Jiao Tong University, China
Dr. Xinjian Luo is a tenure-track Associate Professor at the School of Computer Science, Shanghai Jiao Tong University, and a recipient of the National Young Talent Program. He received his Ph.D. from the National University of Singapore and was a postdoctoral researcher at the Mohamed bin Zayed University of Artificial Intelligence, focusing on large-scale language models.
His research centers on privacy and security risks in distributed machine learning systems, with an emphasis on understanding the internal mechanisms of AI systems through privacy analysis. His recent work spans prompt engineering, multi-agent systems, multimodal large models, and the evaluation and scaling of foundation models.
Dr. Luo has published extensively in top-tier venues across security, databases, and AI, including IEEE S&P, USENIX Security, ACM CCS, NDSS, VLDB, ICDE, and AAAI. He is also actively involved in the research community, serving as Proceedings Chair of ICDM 2026 and as a program committee member for leading conferences such as KDD, WWW, and AAAI.
Workshop Organizers
Accepted Papers
Posters:
From Techne Equalisation to Phronesis Premium: Generative AI and Seniority Bias in Ageing Europe
Dora Moscato
Contrary to the prevailing view that older workers are disadvantaged by technological change, this paper argues that generative AI may advantage them at the expense of younger ones. The mechanism operates through two channels. First, generative AI equalises procedural competence (techne), enabling lower-performing workers to reach the quality of top performers—an executional prosthesis that recovers skills depreciated with age. Second, as execution becomes cheap, the scarce factor shifts to contextual judgment and responsible decision-making (phronesis), capabilities that correlate with reflective experience rather than technical currency. An event study on Eurostat EU-LFS data (2008–2025) shows that the youth employment share in high-AI-adoption sectors diverges sharply downward after ChatGPT’s diffusion, following fifteen years of no statistically significant differential trend, while the senior share expands. The seniority bias carries significant ethical implications: the devaluation of entry-level work erodes the pipeline through which future judgment is formed, while the advantage itself is structurally transitory, shaped by the design choices of AI producers, the organisational strategies of adopting firms, and the institutional responses of regulators. The paper examines the EU AI Act’s human oversight requirement as institutional demand for phronesis and argues that its effectiveness depends on whether oversight remains substantive or degrades into formal compliance.
Hybrid LSTM Framework For Vessel-Type-Aware AIS Anomaly Detection
Raj Singh, Moirangthem Suchitra Devi, Sandeep Kumar, Gwyneth A Chullai
Automatic Identification System (AIS) data plays an important role in large-scale maritime traffic monitoring and anomaly detection. Deep learning models, particularly Long Short-Term Memory (LSTM) networks, are commonly used to model vessel trajectories due to their ability to capture temporal dependencies. However, many existing approaches rely on unified global models trained across different vessel categories, often assuming similar motion patterns. In practice, cargo ships, tankers, and fishing vessels show distinct behaviors, which can affect detection performance. In this work, we examine how vessel-type differences influence AIS-based trajectory anomaly detection using a controlled experimental setup. We compare three modeling strategies under the same architectural settings: (i) a global LSTM model with full parameter sharing, (ii) vessel-type-specific models with full specialization, and (iii) a hybrid shared-backbone model that learns common temporal patterns while keeping vessel-type specific output layers. Experiments on real-world AIS data with controlled anomaly injection show that incorporating vessel-type information improves detection performance. The hybrid model reduces false alarm rates (by about 2–3% compared to the global model) and also improves precision and recall. It further reduces detection latency across all vessel types which allows anomalies to be detected earlier. While fully vessel-type-specific models achieve the best overall performance, the hybrid model offers a practical balance between generalization and specialization, without increasing model complexity. Overall, these results suggest that accounting for vessel-type differences is important for building more reliable AIS anomaly detection systems, and that hybrid modeling provides an effective and scalable solution.
A Three-Layer Threat Assessment Pipeline for Human-Robot Interaction Safety
Sanaullah Sanaullah, Han Byung-Kil, Alexandra Dmitrienko, Hirotada Honda, Thorsten Jungeblut, Dong Il Park
Ensuring safety in human-robot interaction (HRI) environments requires the ability to detect not only physical threats but also subtle indicators of psychological distress — a challenge that existing single-modality systems fail to address comprehensively. This paper presents MindGuard, a three-layer threat assessment pipeline designed for real-time safety monitoring in HRI contexts, uniquely combining mental health risk detection with physical threat recognition within a unified framework. The first layer employs keyword pattern matching and a fine-tuned BERT classifier to identify linguistic indicators of self-harm, suicidal ideation, and violent intent from natural language input. The second layer leverages YOLO-based object detection and a Vision-Language Model (VLM) for continuous scene analysis, identifying physical threat objects and hazardous environmental conditions through live camera feeds. The third layer fuses all signals — textual, semantic, and visual — through a large language model reasoning agent that produces a structured risk decision across five escalation levels: SAFE, MONITOR, RESPOND, ALERT, and EMERGENCY. The system operates in real time on consumer GPU hardware using FP16 inference, achieving low-latency detection suitable for deployment on mobile robotic platforms. Experimental evaluation demonstrates that the multi-modal fusion approach reduces false negatives compared to text-only or vision-only baselines, with the cross-modal signal agreement improving overall classification confidence. MindGuard addresses a critical gap in HRI safety systems by treating mental health crises and physical dangers as equally urgent threat categories, offering a deployable, extensible architecture for socially aware autonomous robots.
Oral Presentations:
Mitigating Demographic Bias Decay in Machine Learning Models
Skanda Sunil, Chen Zhao
Real-world machine learning systems operate under temporal distribution shifts, where fairness degrades over time, causing bias decay. Current methods and approaches consist of either static fairness-aware learning or inline learning methods, which adapt to evolving data but fail to incorporate fairness considerations. As a result, these methods fail to maintain proper fairness under distributional change. To bridge this issue, we propose a novel framework, Temporal Threshold Optimization (TTO). The key idea of TTO is to optimize predictive performance and fairness across time. We evaluated TTO based on real-world datasets, which consisted of financial complaints, police incidence, and news based market data, each exhibiting variability and degradation in fairness over time. TTO constantly has achieved competitive accuracy while maintaining low and stable disparity levels across these shifts. All in all, these findings highlight the limitations of static fairness assumptions and highlight the importance of TTO for reliable machine learning systems to reduce demographic bias.
Conformal Multiverse Analysis: Auditing Uncertainty-Aware Prediction Pipelines
Julia Broden, Jan Simson, Christoph Kern
We introduce Conformal Multiverse Analysis (CMA), an auditing framework which integrates Conformal Prediction with Multiverse Analysis. CMA enables a comprehensive assessment of how uncertainty of prediction models propagates across different modeling choices and affects group-conditional coverage of different groups and minorities. Pairing Multiverse Analysis with Conformal Prediction provides a novel procedure for identifying disparities and critical design decisions, particularly in contexts where uncertainty-aware decision policies are aimed to be deployed. We illustrate CMA by comparing 900 modeling pipelines for an algorithmic profiling task and show that (1) design decisions can affect uncertainty outcomes for subgroups and (2) uncertainty reduction and group-specific coverage can be misaligned. Our work reinforces the need and value of integrating uncertainty quantification into fairness-aware model design and evaluation.
SignCV: A Sign-Consensus Vector Probe for Model-Dependent Multi-Attribute Bias Geometry
Kyungmin Kwon, SeungHun Han, HaYoung Oh
Direction-based bias mitigation methods commonly assume that bias-associated parameter directions for different demographic attributes can be merged into a single shared edit. We show this assumption holds to sharply different degrees across LLMs matched in scale (7B) and release window. Under an identical extraction pipeline on BBQ, Mistral-7B-Instruct-v0.3 and Llama-2-7B-hf exhibit nearly identical within-attribute update stability (within-axis survivor rates of ≈64.0% vs. ≈67.3%), yet their cross-attribute alignment differs by nearly an order of magnitude: only 7.05% of coordinates survive race–gender–SES sign intersection on Mistral, versus 53.48% on Llama—a 7.6× gap. We surface this contrast with SignCV, a sign-consensus vector probe that extracts per-attribute LoRA update directions, retains coordinates with stable within-axis sign agreement, and measures how many survive cross-axis sign intersection; the cross-axis survivor rate quantifies, before any edit is applied, whether a unified multi-attribute projection is structurally plausible for a given checkpoint. Downstream projection edits at 𝑘 = 1.0 yield modest, model-dependent outcomes rather than uniform improvement: on Mistral, Race bias decreases (0.0307→0.0245) while SES and Gender move adversely; on Llama, SES and Gender decrease slightly while Race drifts marginally upward. Evaluation PPL decreases across all three single axes for both models, but we do not interpret this as a fairness gain. Ablations show Full SignCV is not uniformly best on aggregate metrics; its value is diagnostic. We therefore position SignCV as a pre-edit geometry check that reveals when—and on which models—a shared multi-attribute projection direction is structurally plausible.
Curation Cards: Structuring Decisions Made During Human-LLM Collaborations for Data Curation
Shreya Chappidi, Andra Krauze, Jat Singh
Early-stage decision-making during the algorithm development lifecycle, including problem formulation and data labelling, critically shape downstream processes including algorithmic performance and user adoption. Human-LLM collaborations during data mining bring multiple new dynamics to this early-stage decision-making as users can more explicitly shape data annotation and labeling practices. First, users may not understand how design choices made during data curation could alter or shape their intended problem formulations. Second, the non-deterministic nature of LLMs means that small (and often implicit) choices during data curation such as prompt style or phrasing can meaningfully influence the distribution of resulting data labels. As a result, there can be limited visibility into how user choices shape LLM-curated data, which may result in misaligned datasets that do not effectively support the original problem intent, create unreliable or biased data labels, and add complexity during audits of human-LLM data workflows. To support early-stage decision-making on how to design LLM-supported data curation workflows, we propose curation cards—a structured documentation approach that captures purpose/goals of data curation, data input and model specifications, labeling schema and concept definitions, system prompts, reference sources, prompt text and non-semantic details, pipeline design, and error analysis. These elements reflect key early-stage design choices influencing data curation outcomes, where small variations in aspects such as prompt design or schema definition can cascade into systematic shifts that meaningfully influence data. In all, curation cards seek to support AI practitioners by (1) explicitly considering critical factors that will shape dataset creation and curation, (2) clarifying user choices and decisions made during LLM-supported curation, and (3) improving reviewability and auditability of curation processes.
When Equity Looks Like Concentration: A Placebo Protocol for Fairness Claims in Recommender Systems
Vairaaj Bindal
Subgroup-level fairness claims in ML often get confused with descriptive concentration. An aggregate effect can be large in a sub-group because the subgroup is large, even when the subgroup-defining feature carries no targeting information at all. We propose a stratified label-shuffle placebo that tells the two apart, and we apply it to a real recommender for smallholder fertilizer and crop-switch advice trained on 8,744 plots from the LSMS-ISA Ethiopia panel. The recommender maximizes Expected Realized Benefit (predicted yield gain weighted by predicted adoption probability) and beats an accuracy-only baseline by 50 kg/ha on 800 enumeration- area-disjoint test plots, with the largest absolute lift on the no education stratum (+53 kg/ha, 𝑛 = 543/800). On its face that looks like equity targeting. The placebo says otherwise: per-subgroup means fall inside the null under random education-label assignment (𝑝 ∈ [0.14, 0.87]). The aggregate clears Rosenbaum Γ★ ≥ 5, EA cluster permutation 𝑝 < 5 × 10−4, and a SAFI-anchored partial identification bound at +26 kg/ha worst case. The per-subgroup gradient is just descriptive concentration. We argue the placebo should be a default in any subgroup-stratified fairness report.
A Task-Agnostic Metric for Adversarial Vulnerability in Vision-Language Models
Maisha Binte Rashid, Pablo Rivas, Chen Zhao
Vision-language models (VLMs) are highly vulnerable to adversarial perturbations, yet there is no unified metric for measuring their robustness across different tasks and architectures. In this paper, we propose a task-agnostic Vulnerability Score (VS) that quantifies robustness degradation under both structured noise and adversarial perturbations. The metric combines noise-induced and FGSM-based performance degradation into a single interpretable score. We evaluate VS on CLIP, BLIP, and LLaVA using five benchmark datasets spanning classification, captioning, visual question answering, and image-text retrieval tasks. Results show that VS consistently reflects adversarial susceptibility across models and tasks. Further validation with PGD and Carlini & Wagner (C&W) attacks demonstrates strong inverse correlations between VS and adversarial performance, confirming the reliability of the proposed metric. Analysis with Jacobian-based sensitivity measures shows that VS captures broader aspects of practical robustness beyond local gradient sensitivity.
Hyperbolic Graph Learning for Hierarchical Cognitive Diagnosis in Intelligent Education Systems
Han Xiang, Haodong Qian, Shengpeng Wang, Lingyun Liu, Dongran Yu, Xingcheng Fu
Cognitive diagnosis estimates student and group knowledge mastery from response logs, yet educational graphs contain implicit and heterogeneous hierarchies over groups, students, exercises, and concepts. Existing Euclidean or single-space models distort deep hierarchy or conflate student-behavior and knowledge-semantic structures. We propose Hyperbolic Graph Learning for Hierarchical Cognitive Diagnosis (HGCD), which builds a heterogeneous educational graph, learns two adaptive-curvature hyperbolic spaces for student and knowledge hierarchies, and fuses them through a geometry-consistent common tangent space. Experiments on four real datasets show consistent gains for individual- and group-level diagnosis.
Workshop Reviewers
Ram Prasad Nethi (Amazon Web Services)
Venkata Ratna Kumar Bonagiri (Macys)
Ankur Gupta (LinkedIn)
Shatrughna Upadhyay (Intuit Inc)
Ankur Bhatnagar (Macys)
Shahazad Qurashi (Jazan University)
Dong Li (Baylor University)
Jinyang Li (Amazon)
Zhao Zhu (Meta Platforms)
Xinyu Wu (Baylor University)
Denglin Jiang (Bloomberg)
Jiaxin Liu (Meta Platforms)
Boyang Li (University of Notre Dame)
Zhuosheng Liu (UC Davis)
Xiaohui Chen (Baylor University)
Previous Workshops