Research on Human–AI Collaboration

Revisiting Human–AI Teaming

1. Unraveling Human-AI Teaming: A Review and Outlook (with Bowen Lou, T. S. Raghu, Yingjie Zhang). Under major revision. [Paper link]

Artificial Intelligence (AI) is advancing at an unprecedented pace, with clear potential to enhance decision-making and productivity. Yet, the collaborative decision-making process between humans and AI remains underdeveloped, often falling short of its transformative possibilities. This paper explores the evolution of AI agents from passive tools to active collaborators in human-AI teams, emphasizing their ability to learn, adapt, and operate autonomously in complex environments. This paradigm shifts challenges traditional team dynamics, requiring new interaction protocols, delegation strategies, and responsibility distribution frameworks. Drawing on Team Situation Awareness (SA) theory, we identify two critical gaps in current human-AI teaming research: the difficulty of aligning AI agents with human values and objectives, and the underutilization of AI’s capabilities as genuine team members. Addressing these gaps, we propose a structured research outlook centered on four key aspects of human-AI teaming: formulation, coordination, maintenance, and training. Our framework highlights the importance of shared mental models, trust-building, conflict resolution, and skill adaptation for effective teaming. Furthermore, we discuss the unique challenges posed by varying team compositions, goals, and complexities. This paper provides a foundational agenda for future research and practical design of sustainable, high-performing human-AI teams.

From Human to Data to Machine Learning/AI

2. Human-Algorithmic Bias: Source, Evolution, and Impact (with Xiyang Hu, Yan Huang, Beibei Li). Management Science, 2025. Published online. [Paper link] [Amazon AWS AI Research Grant 2023; Finalist CIST 2021 Best Student Paper Award]

In this study, leveraging a unique repeat decision-making setting in a high-stakes micro-lending context, we aim to uncover the underlying source, evolution dynamics, and associated impacts of bias. We first develop and estimate a structural econometric model of the decision dynamics to understand the source and evolution of potential bias in human evaluators in microloan granting. We find that both preference-based bias and belief-based bias are present in human evaluators' decisions and are in favor of female applicants. Through counterfactual simulations, we quantify the effects of the two types of bias on both fairness and profits. The results show that the elimination of either of the two biases improves the fairness in financial resource allocation, as well as the platform profits. Furthermore, to examine how human biases evolve when being inherited by machine learning (ML) algorithms, we then train a set of state-of-the-art ML algorithms for default risk prediction on both real-world datasets with human biases encoded within and counterfactual datasets with human biases partially or fully removed. By comparing the decision outcomes in different counterfactual settings, we find that even fairness-unaware ML algorithms can reduce bias present in human loan-granting decisions. Interestingly, while removing both types of human biases from the training data can further improve ML fairness, the fairness-enhancing effects vary significantly between new and repeat applicants. Based on our findings, we discuss how to reduce decision bias most effectively in a human-machine learning pipeline.

Prompt Human–AI Collaborative Value in Real-world Application

3. 1+1>2? Information, Humans, and Machines (with Yingjie Zhang). Information Systems Research, 2025, 36(1): 394-418. [Paper link]

Drawing upon studies in dual-process theories of reasoning that propose different conditions necessary to arouse humans' active information processing and systematic thinking, we tailor the experimental treatments to vary the level of information complexity, the presence of collaboration, and the availability of machine explanations. We observe that, with large volumes of information and with machine explanations alone, human evaluators cannot add extra value to the final collaborative outcomes. However, when extensive information is coupled with machine explanations, human involvement significantly reduces the financial default rate compared with machine-only decisions. We disentangle the underlying mechanisms with three-step empirical analyses. We reveal that the coexistence of large-scale information and machine explanations can invoke humans'active rethinking, which, in turn, shrinks gender gaps and increases prediction accuracy. In particular, we demonstrate that humans can spontaneously associate newly emerging features with others that have been overlooked but have the potential to correct the machine's mistakes. We also examine cases where humans tend to either follow or overrule AI, along with the corresponding outcomes.

4. The Power of Disagreement: A Field Experiment to Investigate Human-Algorithm Collaboration in Loan Evaluations (with Hongchang Wang, Yingjie Zhang). Management Science, 2025. Published online. [Paper link]

What is collaborative value in human-algorithm collaboration, and how can it be achieved? We study these questions by conducting a field experiment where human evaluators and algorithms worked together to evaluate loan applications. We apply a two-by-two design to represent four collaboration scenarios, i.e., limited/rich information and with/without disclosure of algorithm rationale. In the experiment, human evaluators are asked to first make an initial independent decision and then make the final decision after receiving an algorithmic recommendation. We find that the final decisions are better (as measured by both right approval and right denial) than human-only decisions or algorithm-only decisions, indicating the presence of collaborative value. We measure this value with a novel concept, "decision augmentation," and find that disclosing algorithm rationale decreases collaborative value under the limited-information scenario but increases collaborative value under the rich-information scenario. To further understand the path to collaborative value, we propose a framework for mechanism examination that centers on collaborative disagreement, which occurs when human evaluators reject algorithmic recommendations. We then examine several vital, rationality-based factors within this framework and come to the following conclusions: (1) collaborative disagreement exhibit sizeable predictive power on collaborative value; (2) the differences between human evaluators and algorithms in decision-making contribute to disagreement but not to collaborative value; (3) the algorithm self-contradiction level increases disagreement and helps human evaluators disagree with the algorithms at the right time.

5. Empathic Algorithm Collaboration: Decision Augmentation for Socially Consequential Decision-Making (with Thomas Ware, T. S. Raghu, Benjamin Shao). Under review. [Paper link]

The integration of artificial intelligence (AI) into decision-making systems presents new challenges for maintaining human intentionality, empathy, and accountability, particularly in socially consequential contexts where outcomes materially affect individuals’ lives. This study introduces a re-humanizing approach to AI-assisted decision-making by framing algorithmic collaboration through the lens of structural embeddedness, where AI is not simply a tool for optimization, but a socially situated actor in decision ecosystems. We propose Empathic Algorithm Collaboration (EAC) as a mechanism to restore individualized, ethically grounded consideration in human-AI decision environments. Drawing on theories of social empathy and person-situation interactionism, EAC consists of two key components – Interpersonal Perspective-Taking (IPT) and Contextual Understanding (CU) – which guide human evaluators to engage more reflectively with AI recommendations. Through a field experiment in a micro-lending context, we examine the impact of EAC on deliberative intensity and decision quality. Results show that EAC not only fosters more conscientious deliberation but also improves decision accuracy. Furthermore, differences in evaluator responses across borrower conditions suggest that person-situation interactionism plays a critical role in shaping the cognitive and ethical engagement of decision-makers. These findings highlight the potential of EAC to support human-centered, context-sensitive decision augmentation and contribute to a broader understanding of how AI can be aligned with social values in structurally embedded domains of socially consequential decision-making.

6. Responsible Engagement: Organizational Strategy to Counteract Agency Reversal (with Thomas Ware, T. S. Raghu, Benjamin Shao). Under review. [Paper link]

This study examines the challenge of agency reversal in AI-assisted decision-making, where human decision-makers become passive executors of algorithmic recommendations rather than active evaluators. Through a field experiment in the microlending sector, we introduce the concept of responsible engagement (RE) as a cognitive mechanism designed to counteract agency reversal without sacrificing the efficiency benefits of AI. Drawing on Work System Theory, we implement a behavioral nudge that positions decision-makers as the primary authority in the AI-assisted process. Our findings demonstrate that evaluators exposed to RE exhibit deeper information processing, more discerning integration of AI recommendations, and stronger perceptions of personal responsibility. Notably, these improvements in human agency do not come at the cost of decision efficacy or increased cognitive burden. The study contributes to our understanding of how socio-technical design can mitigate agency reversal while preserving the instrumental value of AI in operational decision-making, offering practical insights for responsible AI governance in financial and other organizational contexts.

7. Incentive Contract Design to Maximize Human–AI Collaborative Value (with Dingwei Gu, Xianghua Lu, Yingjie Zhang). In progress. [Paper link]

This ongoing study builds an analytical model to investigate incentive contract design aimed at maximizing the human–AI collaborative value.

An Evolutionary Perspective on Human–AI Interaction

8. Augmented Algorithms, Adaptive Humans? Evidence from a Natural Experiment (with Xianghua Lu, Yiyu Huang, Hai Wang). Under major revision. [Paper link]

AI capability continuously evolves through interactions with accumulated data and domain experts. This is achieved through ongoing learning to enhance specific algorithms for decision-making from human individuals in diverse industries. Would individuals be more inclined to follow AI advice if they understood that AI systems acquire wisdom from humans? Testing our hypotheses in the on-demand food delivery domain, we find the following: (1) High-experienced human riders show increased compliance with more human-like AI augmentation. (2) Their short-term performance becomes more balanced, with improved hourly delivery productivity but decreased on-time delivery ratios. (3) Mechanism analysis reveals their proactive shifts from prioritizing personal preferences to a balanced approach recommended by AI. (4) Over the long term, high-experienced riders recover on-time delivery ratios through self-regulated learning. (5) Low-experienced riders who consistently adhere to AI suggestions also benefit from AI capability augmentation in food delivery performance. Our findings delineate a dynamic cycle of mutual learning and reinforcement to demonstrate reciprocal wisdom between AI and humans, which underscores the critical role of high-experienced humans in achieving superior collaborative task outcomes in human-AI system evolution.

9. From Signal to Noise: How Widespread LLM Usage Transforms Evaluator Behavior in Credit Screening (with Paramveer Dhillon, Yi Gao, Yingjie Zhang). Under review. [Paper link]

Large language models (LLMs) have transformed how applicants present themselves in screening processes and have created a fundamental tension: while AI-assisted writing enables better communication of applicant quality, widespread usage may erode the informational content that evaluators rely upon for decision-making. We examine this trade-off through a randomized field experiment where 59 professional evaluators assessed 1,000 micro-loan applications, exogenously varying LLM usage rates from 0% to 75% across treatment groups. Our results reveal a non-monotonic relationship between crowd-level LLM usage and screening performance. Moderate usage rates (15-30%) improve approval outcomes for qualified borrowers without affecting default rates, while widespread usage (60-75%) generates "signal dilution," a systematic degradation in diagnostic value as stylistic homogenization reduces variance in quality indicators. Drawing on Effort-Accuracy Tradeoff Theory and Signal Detection Theory, we show that high usage rates diminish evaluators' perceived discriminatory power, prompting reduced cognitive effort and increased approval conservatism. These behavioral adaptations prove counterproductive, increasing Type I errors while failing to reduce Type II errors, ultimately worsening portfolio performance and constraining credit access. We complement our empirical findings with an analytical model that extends the analysis beyond experimental constraints, deriving optimal usage thresholds and revealing that evaluator uncertainty about LLM prevalence can paradoxically worsen screening outcomes. Our analysis establishes signal dilution and evaluator effort adjustment as key mechanisms through which AI democratization undermines decision quality in information-intensive markets, with implications for recruitment, admissions, and other high-stakes screening environments.

10. Gatekeeping AI in Hierarchical Decisions: Stimulating Effort? Improving Outcomes? (with Peijian Song, Yingjie Zhang, Yinglin Ruan). In progress. [Paper link]

Most research on human–AI collaboration examines Guiding AI, where algorithmic recommendations are disclosed directly to employees, which often shows efficiency gains but also effort substitution, over-reliance, and algorithmic aversion. We instead study Gatekeeping AI, which provides recommendations exclusively to managers in hierarchical decision processes. Gatekeeping AI raises three tensions: it may overlook employees’ private knowledge while also stimulating effort; such stimulation is itself uncertain, as confident employees may not respond; and even when effort rises, outcomes may not improve if misaligned incentives drive persuasion or moral hazard. We test these tensions in a two-stage field experiment of procurement in a Chinese environmental technology firm. Results show that Gatekeeping AI improved managerial efficiency and stimulated buyer effort through competitive pressure and disciplinary monitoring, whereas Guiding AI reduced effort via substitution and over-reliance, ultimately increasing costs. Our study extends the literature by embedding AI deployment into hierarchical structures and principal–agent dynamics.

Page updated

Google Sites

Report abuse