Abstract:
Predicting human decisions under risk and uncertainty remains a fundamental challenge across disciplines. Existing models often struggle even in highly stylized tasks like choice between lotteries. Here we introduce BEAST gradient boosting (BEAST-GB), a hybrid model integrating behavioural theory (BEAST) with machine learning. We first present CPC18, a competition for predicting risky choice, in which BEAST-GB won. Then, using two large datasets, we demonstrate that BEAST-GB predicts more accurately than neural networks trained on extensive data and dozens of existing behavioural models. BEAST-GB also generalizes robustly across unseen experimental contexts, surpassing direct empirical generalization, and helps to refine and improve the behavioural theory itself. Our analyses highlight the potential of anchoring predictions on behavioural theory even in data-rich settings and even when the theory alone falters. Our results underscore how integrating machine learning with theoretical frameworks, especially those—like BEAST—designed for prediction, can improve our ability to predict and understand human behaviour.
Bio:
Ori Plonsky is an Assistant Professor in the Faculty of Data and Decision Sciences at the Technion-Israel Institute of Technology. His research spans behavioral decision-making, human learning, and the integration of data science with behavioral science, focusing on human choice prediction and computational modeling of behavior. Ori holds a PhD in Behavioral Sciences and has an engineering background, enhancing his interdisciplinary approach. His work has been published in journals such as Psychological Review, Nature Human Behavior, and PNAS. In 2022, he received the Hillel Einhorn New Investigator Award from the Society for Judgment and Decision Making.
Summary:
Goal: predict people’s response to incentive structures
Want to incentivize some desired behavior in a population (e.g. reduce car use, ensure people take their medicine)
Given some budget for incentivizing a goal, how to do this best?
Many approaches:
Loss aversion: give people money at start of month, take away when they don’t do desired thing
Lotteries
Accounting for diminishing sensitivity: people more sensitive to $1 difference between 10 and 11 than to 99 and 100
Can experiment with many different types of treatments: effective but expensive
Can ask experts but they disagree and are often wrong
Want to create an ML model that predicts human responses to stimuli
Approach: model competitions using human training data (e.g. via Kaggle)
Given new choice problem: decisions under risk and uncertainty
Goal :predict human response to choice problem
Baseline models provided
Researcher submit solutions/model
Problem: choice under risk and uncertainty
Choices: A or B
A: 3 with certainty
B: 4 with prob .8. 0 with prob .2
Can expand to lotteries, more outcomes, unknown probabilities, etc.
Experiments:
720 choice tasks (210 train, 60 test)
930 participants, ~700k choices
Accuracy metrics:
mean squared error
Completeness: proportion of predictable error the model error
(errorbase - errormodel) / (errorbase - errorperfect_model)
46 teams, 20 submissions
Best: BEAST-GB: Beast Gradient Boosting
Beast theory: people use strategies that are useful in many situations, then apply them to similar situations
Unknown:
What similarity function do people use?
What strategies do they have?
Model:
Input:
Risky choice task is given to the BEAST theory model to predict what the theory implies for the decision
BEAST is used as a foundation model
Task payoffs and probabilities
Train Extreme Gradient boosting model
Output: prediction of human choice
All submissions combined theory and ML
Prior work has shown that an unconstrained neural network works better than pure BEAST
BEAST-GB beats both BEAST and neural nets: 96.2% complete. High accuracy even when training on a few records
Ablation studies show that psychological, BEAST and payoff features are key when used by ML model
BEAST helps the ML generalize across experiments
BEAST-GB is more accurate than other approaches
Also, slightly more accurate than using other experiments with same payoffs but different UIs
That baseline captures the baseline variability of the experimental process
BEAST-GB is more accurate because it captures the mean of the experimental noise process, while real experiments are sampled from a biased distribution of study designs/populations/UIs
Next: larger experimental studies with more verbal cues to people
LLMs are showing some capability for predicting behavior but the results are still inconsistent