Working Papers
[P1] Enhancing the Wisdom of Crowds When Humans and AI Collaborate.
Angshuman Pal, Asa B. Palley, Ville A. Satopää.
(Click for Abstract)
Abstract: Accurate forecasts are critical for financial and business decision making. Such forecasts may be generated by human experts or by artificial intelligence (AI) technologies. A decision maker can benefit from the distinct advantages that each source may offer by providing AI assistance to the experts, allowing them to augment the information contained in the AI forecast by incorporating their own knowledge about the variable of interest. When multiple experts are available, accuracy can be further improved by utilizing the wisdom of crowds, forming a consensus by averaging the individual, AI-assisted expert forecasts. However, the potential accuracy of a crowd of AI-assisted forecasters may be limited by two forms of systematic bias. First, because the AI assistance is valuable to each expert at an individual level, the opinion of the AI can end up being over-represented in the crowd's consensus. Second, the experts may fail on average to appropriately utilize the AI assistance when forming their forecasts, either under- or over-emphasizing the information it provides to them. Each behavior causes a directional and potentially opposing systematic bias in the consensus forecast, reducing the accuracy of the crowd. Using a stylized Bayesian model of information aggregation, we develop a procedure that can recover the most accurate consensus forecast given all the information collectively observed by the AI technology and every expert in the crowd. This procedure removes the crowd's collective bias by pivoting the average AI-assisted forecast either toward or away from their average initial, unassisted forecast. Furthermore, the optimal pivoting procedure can be decomposed into two well-studied phenomena in the literature on human-AI collaboration—algorithm aversion and the borg effect. We provide a prescriptive, data-driven method the decision maker can use to estimate how much to pivot a given crowd's average forecast using their historical forecasting performance. We carry out three experiments with human participants to test the performance of the proposed aggregation method in different forecasting environments and find that it provides superior accuracy relative to unassisted humans, the AI technology on its own, and the AI-assisted humans.