Machine Learning and AI for Economists

Where to start

Mullainathan and Spiess (Journal of Economic Perspectives, 2017) is a good introduction to ML for economist. The Online Appendix has a lot of important technical details to implement ML algorithms in practice
- See also Mullainathan's lecture Economics in the Age of Algorithms (AEA P&P and 2025 ASSA recording)
The Impact of Machine Learning on Economics by Athey (NBER, 2018)
Machine Learning Methods That Economists Should Know About by Athey and Imbens (ArXiv, 2019; ARE, 2022)
Deep Learning for Economists by Dell: NBER paper, JEL article, and website EconDL
Generative AI for Economic Research by Anton Korinek: JEL article and website GenAIforEcon
Literature review by Bahoo et al. (JoES, 2025). See also overview by Strittmatter (IZA, 2025)
In defense of Machine Learning in Economics by Gietner (Substack, 2025)

Online courses and slides

Ng (Coursera) is a great MOOC. Ng is clear and does not go too fast. Applications in Matlab
Markham (R-bloggers) is the MOOC accompanying the textbook An Introduction to Statistical Learning. Entry-level course. Applications in R
- Bellemare wrote a few posts (here and here) on what he learned from this textbook.
Caffo, Leek and Peng (Coursera) explain how to implement ML algorithms in R. Knowledge of R is a pre-requisite
Short course on ML and causal inference at Stanford by Athey, Spiess, and Hager (YouTube)
IMF course on ML for economists by Michal Andrle
Advances in Causality and Foundations of Machine Learning by Maximilian Kasy (additional slides here)
ML in Development Economics and Applied Micro by Michael Koelle

If you want to learn more

Key resources
- Check out the work by Susan Athey and Victor Chernozhukov
  - Chernozhukov and coauthors have a book on causal inference and ML
- Vikesh Koul has a comprehensive list of resources at the intersection of Data Science and Economics/Social Science/Policy
Textbooks
- Friedman, Hastie and Tibshirani (Springer, 2008) provides a comprehensive (detailed and advanced) overview of the most commons ML techniques
- Kuhn and Johnson (Springer, 2013) is a good complement to the textbook by Friedman et al. They cover different topics, or explain the same techniques from different angles.
Main articles
- Askitas (IZA, 2024) is a ML primer for social scientist
- Brand et al. (ARS, 2023) summarize recent development in ML for causal inference
- Storm et al. (ERAE, 2020) review ML tools from an applied economist’s perspective. See also handbook by Baylis et al. (HAE, 2021)
- Molina and Garip (ARS, 2019) give a broader overview of ML in the social sciences
- Athey (Science, 2017) on using big data for policy problems
- Athey and G. Imbens (Journal of Economic Perspective, 2017) discuss recent advancements in ML methods for casual inference
- Subrahmanian and Kumar (Science, 2017) discuss future directions and challenges for ML
- Belloni et al. (Journal of Economic Perspective, 2014) on using ML for model selection
  - Angrist and Frandsen (JoLE, 2022) investigate more the idea of using ML for selecting control variables in OLS, and in the first-stage of IV. They find that ML works better at selecting controls than instruments.
- Varian (Journal of Economic Perspective, 2014) discusses the benefits of ML for economists

Tools and best practices

Predict treatment heterogeneity
- Causal Forest developed by Wager and Athey (JASA, 2018) and Generalized Random Forests (Athey et al., AoS, 2019)
  - Application to summer jobs by Davis and Heller (AER P&P, 2017)
    - See Bertrand et al. (WP, 2017), O'Neill and Weeks (WP, 2018), Naguib (WP, 2019), Guo et al. (JMR, 2020), Knittel and Stolper (AEA P&P, 2021), Athey et al. (RePec, 2021), Denteh and Liebert (IZA, 2022), Lechner and Mareckova (arXiv, 2022), Cockx et al. (Labour Economics, 2023), Goller et al. (Labour Economics, 2025) for applications and extensions
    - See Knaus et al. (JHR, 2022) for another application using a LASSO-type estimator and Knaus et al. (EctJ, 2021) for comparisons of different ML heterogeneity methods using Monte Carlo simulations
  - Accessible description by Athey and Imbens (PNAS, 2016)
  - Limitations discussed by Cattaneo et al. (ArXiv, 2025)
  - Application to energy use by Knittel and Stolper (NBER, 2019)
  - Huntington-Klein (JCI, 2020) use casual forest to improve finite-sample performance of IV by modeling first-stage heterogeneity i
  - Gavrilova et al. (CESifo, 2023) extend the causal forest to difference-in-difference settings and event studies
  - Haushofer et al. (AER, 2025) compare targeting based on predicted deprivation versus targeting based on treatment effect
- Generic Machine Learning Inference by Chernozhukov et al. (ArXiv, 2018)
  - For an application, see Buhl-Wiggers et al. (JEconometrics, 2022)
  - Extension to quasi-experimental settings by Deryugina et al. (NBER, 2019).
- Heterogeneity with instruments by Syrgkanis et al. (ArXiv, 2019). See also Biewen and Kugler (IZA, 2020)
- Blog post by Özler on different techniques to estimate heterogeneous treatment effects including causal forest and rank average treatment effect
- Beware: Leng and Dimmery (ISR, 2024) observe substantial discrepancies between machine learning-based treatment effect estimates and difference-in-means estimates directly from the randomized experiment
Intersection of ML and Econometrics
- Using ML to improve pre-analysis plans by Ludwig et al. (AEA P&P, 2019)
- Choosing among regularized estimators in empirical economics by Abadie and Kasy (ReStat, 2019)
- Ovaisi et al. (PWC, 2020) combine ML with Heckman's two-stage method in learning-to-rank systems
- Wang et al. (PNAS, 2020) developed a method for correcting inference in a second-stage when using outcomes predicted by ML in a first-stage. With an open-source R package
- Mostly Harmless Machine Learning by Chen et al. (ArXiv, 2020) is a guide to use ML in instrumental variable designs.
  - See also Lennon et al. (ArXiv, 2025): linear ML methods work better than non-linear one in the first stage of 2SLS
- Chang (EconometricsJournal, 2020): ML with difference-in-difference. See also Hatamyar et al. (RePEc, 2023)
- Kasy and Sautmann (Econometrica, 2021) proposed an algorithm for adaptive treatment assignment in experiments for policy choice. See also their article on VoxDev
  - Hadad et al. (PNAS, 2021): discuss how to construct confidence intervals in adaptive experiments
  - See also this quick intro on adaptive experiments by Greene
  - Tabord-Meehan (REStud, 2023) proposes an adaptive randomisation procedure for two-stage randomised controlled trials
- Athey and Wager (Econometrica, 2021) developed an algorithm for choosing whom to treat using observational data
- Narita and Yaka (ArXiv, 2021) develop an estimator exploting natural experiments following algorithmic decision rules
- Athey et al. (JASA, 2021) link matric completion methods with the unconfoundedness (matching) and synthetic control approaches. See also blog post by Cunningham
- RCT vs ML (by Prest et al, WP 2021): ML replicates the true treatment effects, but DiD replicate the experimental benchmark as well, suggesting little benefit from ML approaches over standard program evaluation methods
- Knaus (EctJ, 2022) applies double ML to programme evaluations. See also Knaus (RSSA, 2021)
- Hoffman (RePEc, 203) reviews the latest methods in double robust, flexible covariate adjustment for causal inference and argues that these methods do not necessarily outperform OLS regression or matching
Applying economic thinking to ML
- Baiardi and Naghi (RePec, 2021): revisit influential empirical studies to show the advantages of causal machine learning methods
- Raghu et al. (ArXiv, 2019) start thinking on how to combine ML and human experts, rather than just comparing performances, while Hofman et al. (Nature, 2021) provide a big-picture interdisciplinary view of ML, and urgue that it should be seen as a complement to causal inference.
  - Gennatas et al. (PNAS, 2019): example of expert-augmented machine learning, an automated way to extract problem-specific human expert knowledge and integrate it with ML
  - Ribers and Ullrich (ArXiv, 2020) combine ML predictions with physician diagnostic skill to improve the efficiency in antibiotic prescribing
  - Tubadji et al. (Economic Inquiry, 2021) looks at the propensity of consumers to adopt AI in banking services
- Sansone (OBES, 2018) uses ML to predict high school dropout. He uses economic theory to guide the calibration of the ML algorithms. Stata conference presentation. See also Eegdeman et al. (Education Economics 2022; FrontEduc, 2022)
  - Coyle et al. (Science, 2020) discuss the importance of clarifying the objective function when applying ML to policy
- Björkegren et al. (ArXiv, 2020) discuss a manipulation-proof ML method, with an application to a field experiment in Kenya
Other tools and best practices
- Cerqua et al. (OBES, 2025) provide guidelines for practitioners to implement ML with panel data
- Coulombe et al. (JAE, 2022) discuss how to use ML for macroeconomic forecasting
- Farbmacher et al. (EctJ, 2022) combine causal mediation analysis with double machine learning
- Narayanan and Kalyanam (RePec, 2021) estimate treatment effects by exploting discontinuities generated by ML applications
- Peterson et al. (Science, 2021) use ML to test decision-making theories. See summary by Bhatia and He (Science, 2021)
- Borup et al. (SSRN, 2020) on targeting predictors in random forest
- Kallus et al. (ArXiv, 2020) use ML to estimate quantile treatment effects
- Goller et al. (LabourEconomics, 2020) try to use LASSO and Random Forest to improve the first stage in propensity score matching
- Steurer and Hill (RePec, 2019): evaluate different performance metrics used in the housing market
- Lechner et al. (WP, 2019) have developed a Random Forest estimator of the ordered choice model
- Anastasopoulos (RePec, 2019) uses an adaptive lasso algorithms to select covariates in regression discontinuity designs
- How to do cross-validation with time series data? Schnaubelt (RePec, 2019)
- Be aware of bad controls when selecting variables using ML! (Hünermund et al., ArXiv, 2021)
  - Similarly, Wüthrich and Zhu (ReStat, 2023) show that post–double Lasso and debiased Lasso can exhibit substantial omitted variable biases due to Lasso's not selecting relevant controls
- Vafa et al. (ArXiv, 2024) use foundation models to decompose the gender wage gap and better account for career histories
Bonus material
- A visual introduction to ML and decision trees by R2D3
- The blog Freakonometrics has several posts on ML and econometrics
- Colin Cameron, Melissa Dell, and Paul Goldsmith-Pinkham have ML slides and additional references on their webpages
- Beware of the sparsity assumption (Giannone, Lenza, Primiceri on VoxEU, 2018 and Econometrica, 2021)
- Is ML just a fad? (The Economist, 2016)
- Are Machine Learning and Big Data changing Econometrics? by Angrist (MRU, 2019)
- Angrist vs. Imbens on the future of ML in economics (MRU, 2022)

Fairness, ethics, and privacy

Kleinberg et al. (AER P&P, 2018) advocate for not excluding variables such as race from the set of ML inputs in the name of fairness
- Similar argument used for gender by Sean Higgins when predicting creditworthiness (MIT, 2019)
- See also Rambachan et al. (NBER, 2020) on regulating algorithms, as well as Rambachan et al. (AEA P&P, 2020)
- Rodolfa et al. (ArXiv, 2020): Trade-offs between accuracy and fairness assumed to be inherent in ML may be small in practice, making reducing disparities more practical
- But Vlasceanu and Amodio (PNAS, 2022) emphasize that algorithms may reflect gender inequalities existing within a society
- See also Lockhart (RePec, 2022) for a overview of how gender and sex interact with ML algorithms
A black box can and should be used when it produces the best results. (Holm on Science, 2019)
- Babic et al. (Science, 2021) discuss the drawbacks of requiring black-box algorithms to be explainable
ML could actually be used to detect discrimination (Kleinberg et al., PNAS 2020)
Ozler on the ethics of ML (Development Impact, 2019)
Kleinberg et al. (QJE, 2018) on ML being less biased than human judges when making bail decisions. Good summary by J. Doleac (Medium). Related case study in Latin America (IDB, 2019). See also Ludwig and Mullainathan (NBER, 2021; JEP, 2021)
- When providing judges with risk scores, there was an increase in racial disparities due to judges overriding the recommended action for moderate-risk black defendants (Albright, WP 2019). See also Arnold et al. (NBER, 2020, AEA P&P, 2021)
- Stevenson and Doleac (AEJ:Policy, 2024): even if the ML predictions are perfectly fair, the way humans take them into account might be biased. Example from judges in Virginia
- Angelova et al. (NBER, 2023): 90% of the judges underperform algorithm bail recommendation when they make a discretionary override (but 10% of the judges outperform the algorithm in terms of both accuracy and fairness )
Li et al. (NBER, 2020) look at hiring as a contextual bandit problem and build a ML algorithm that improves both quality of candidates and demographic diversity, thus increasing equity and efficiency.
Bjerre-Nielsen et al. (PNAS, 2020): models in education using only administrative data perform considerably better and, importantly, do not improve when adding high-resolution, privacy-invasive behavioral data
Algorithmic discrimination and measurement error by Basu et al. (NBER, 2021)
Bird et al. (JPAM, 2025) on racial algorithm bias in higher education: bias exist, but more data may mitigate issue
Sariola et al. (ArXiv, 2025) use data from audit studies to both train and evaluate automated hiring algorithms and improve fairness

Software

LOST has some nice ML examples and explanations
Stata
- User's corner on the official Stata website lists all the ML algorithms available in Stata
- The Stata Lasso Page provides lasso codes for prediction, model selection and casual inference. It also includes elastic net, ridge regression, and double-lasso. See also the related IZA working paper and Stata Journal article
  - Stata 16 has a built-in lasso command with several extensions and calibration options
  - Stata 19 has a built-in ensemble decision trees command with h2oml
- The articles on the Stata Journal on Support Vector Machines, Boosting, and Random Forest provide a nice and succinct introduction to these techniques. See also the Stata Blog for Support Vector Machines in Stata/Python integration.
- Pystacked combines predictions from multiple ML into a final prediction to improve performance
- Ddml (Achim et al., ArXiv 2023) implements Double/Debiased ML (Chernozhukov et al., Econometrics Journal 2018)
- Rcall integrates R ML algorithms in Stata by Haghish
  - Similar Python-Stata integration with c_ml_stata and r_ml_stata by Cerulli (slides, Stata article)
- Quick guide for Random Forest by CSAE
- Gtools is a Stata package for big data
- Crossfold performs k-fold cross-validation
- Precision-recall curves vs. ROC when dealing with unbalanced data
- MLRtime estimates causal forest
- Pylearn estimates random forests, gradient boosting/adaptive boosting, decision trees, and feed-forward neural nets
- Dsheckman estimates a double-lasso Heckman selection model
ML packages in R (CRAN)
- DoubleML by Bach et al. (ArXiv, 2021) implements the double/debiased machine learning framework in R
OpenRefine by Google is an open-source software for data cleaning

Data

Text as data by Gentzkow et al. (JEL, 2019). See also Ash and Hansen (CEPR, 2023) and Hassan et al. (JEP, 2025)
- Thorsrud (JBES, 2020) constructs a daily business cycle index based on quarterly GDP growth and textual info from a business newspaper
Jayachandran et al. (NBER, 2021) use ML to identify the best survey closed-ended questions to predict an agency score measured through qualitative interviews
Abowd et al. (NBER Summer Institute, 2017) on data linkage, e.g. across different Census waves. See also Price et al. (NBER, 2019) and James Feigenbaum's research
- Combes et al. (IZA, 2021) discuss how to use ML to extract data from historical documents
Schierholz and Schonlau (JSSaM, 2020) compare different ML algorithms for automated occupational coding

Lectures and podcasts

Athey (ASSA, 2019) on the impact of ML on economics and econometrics
- Watch also her lecture at the European Central Bank (ECB, 2019)
Duflo (NBER Summer Instititute, 2018) summaries current ML techniques that can be used by economists, with special focus on RCT. Slides and codes on GitHub. Final paper with Chernozhukov et al. (Econometrica, 2025)
- See also Quistorff and Johnson (ArXiv, 2020) for ML applications to restrict randomization in the design of experiments
Athey and Imbens (AEA Continuing Education, 2018) on machine learning and econometrics
Shiferaw (APPAM, 2017) discusses ML for Policy Analysis with Susan Athey
Athey and Imbens (NBER Summer Institute, 2015) is a mini-course on supervised ML, unsupervised ML, and ML for causal inference
The 2021 Summer Institute in Machine Learning in Economics hosted by the Center for Applied AI at Chicago Booth (YouTube)
Additional webinars (mainly on ML in macro) recorded by AMLEDS

Cool applications

Education
- Athey et al. (ArXiv, 2023) use ML to inform targeting in a large-scale field experiment nudging students to renew their financial-aid applications
- Sallin and Balestra (RePec, 2022) use ML to understand which dimensions of peer characteristics are the most predictive of academic success and estimate high-dimensional peer effects functions
- "What do economic education scholars study?" is an example of text analysis and unsupervised ML by Fernandez et al. (JEE, 2021)
- Mozer et al. (EdWP, 2021) discuss how ML text analysis can be used to evaluate treatment effects of education interventions targeting young children's writing skills
- Kizilcec et al. (PNAS, 2020) use ML to personalize behavioral science interventions in online education (with limited improvements)
- Beattie et al. (EER, 2018): no improvements from using ML to predict college success and failure. See also Bird et al. (EdWP, 2021)
  - Similar results in Orlov et al. (AEA P&P, 2021) when using LASSO to identify econ undergrad students at risk of underperforming
  - But Akmanchi et al. (EdWP, 2023) note that ML achieved similar accuracy than college advisors on predicting high-achieving lower-income students’ college enrollment quality among students with whom advisors had few interactions
  - Paul et al. (JHR, 2023) also find similar performance by logit and ML when predicting various adult socio-economic and health outcomes from childhood data
- Wu and Weiland (EdWP, 2024) use ML to improve early warning systems in order to decrease chronic absenteeism in early childhood
Labor, Gender, and Crime
- Mueller-Smith et al. (NBER, 2023) combine ML with regression discontinuity (age of majority rule) to identify mechanism-specific treatment effects that underpin the overall impact of adult prosecution
- Strittmatter (LabourEconomics, 2023) uses Causal ML to estimate conditional average treatment effects and analyze a labour welfare programme
- Pisanelli (EconomicsLetters, 2022): AI can reduce gender inequality in the probability to be interviewed for high-skill jobs compared to human recruiters
- Heller et al. (NBER, 2022): shootings are predictable enough to be preventable. Of the 500 people with the highest out-of-sample predicted risk, 13 percent are shot within 18 months
  - See also Bhatt et al. (NBER, 2023) arguing that ML and human expert may work well together in predicting gun violence
- Eberhardt et al. (IZA, 2022) find with ML gender difference in reference letters in the job market for entry-level econ faculty positions
- Burn et al. (JLabE, 2022) use text analysis of job ads in combination with measures of age discrimination from a correspondence study
- Koffi (RePec, 2021) use ML to show that female-authored papers are more likely to be omitted from references of related papers than male-authored papers
- Cengiz et al. (JoLE, 2022) use ML to identify minimum-wage workers before using event studies to estimate the impact of minimum wage increases
- Burn et al. (NBER, 2021): develop a ML algorithm to detect whether a job ad is ageist and to find employers more likely to be engaging in age discrimination
- Stachl et al. (PNAS, 2020) use ML to predict individuals’ Big Five personality dimensions
- Salganik et al. (PNAS, 2020): hundreds of researchers attempted to predict six life outcomes using ML with a rich dataset. No one made very accurate predictions.
- Bonaccolto-Töpfer and Briel (Labour Econ, 2022) use ML to compute the adjusted gender pay gap. See also Strittmatter and Wunsch (IZA, 2021)
- Borup and Schütte (JBES, 2021) use ML and Google Trends to predict US employment growth
  - Borup et al. (SSRN, 2021) generate a sequence of now- and backcasts of weekly unemployment insurance initial claims based on Google Trends search-volume data for terms related to unemployment
- Sajjadiani et al. (Applied Psychology, 2019) use ML to predict teacher performance and retention in Minneapolis using pre-hire work history
- Chalfin et al. (AER P&P, 2016) apply ML for police hiring and teacher tenure decision
Poverty and Inequality
- Mahler et al. (WB, 2025): a simple model relying on GDP per capita, under-5 mortality rate, life expectancy, and rural population share gives almost the same accuracy in predicting income and consumption distributions as a complex ML model using 1,000 indicators jointly
- Corral et al. (WB, 2023; JDE, 2025) warn against using ML to create poverty maps when the input and validation data quality is low (see also blog post)
- Sansone and Zhu (OBES, 2023): ML can accurately identify individuals at risk of being long-term income support recipients at no extra cost using administrative data already available to caseworkers
  - van der Berg et al. (IZA, 2023) compare ML predictions with self-reported and casework assessment on the probability of finding a job within 6 months for newly unemployed individuals
- Aiken et al. (JDE, 2023) use ML with mobile phone data to accurately predict ultra-poor households in Afghanistan
- Bloise et al. (JEI, 2021) use ML to predict parental income in two-stage estimations of intergenerational income mobility
  - de Vries (RePEc, 2025) use ML with rich Dutch data with family characteristics to show that conventional analyses using parental income only considerably underestimate intergenerational dependence
- Lentz (WorldDevelopment, 2019) predict food security status in Malawi by incorporating granular market data, remotely-sensed rainfall and geographic data, and demographic characteristics
- Dong et al. (PNAS, 2019) predict neighborhoods’ socioeconomic attributes using restaurant data
- Brunori et al. (SJoE, 2023) estimate inequality of opportunity from regression trees
- Jean et al. (Science, 2016) combine satellite data with ML to predict poverty
  - For additional applications of ML and satellite dates, see also Rolf et al. (NBER, 2020), Yeh et al. (NatureCommunications, 2020), Burke et al. (Science, 2021), Aiken et al. (NBER, 2021), Huang et al. (NBER, 2021), Chi et al. (PNAS, 2022), Lehnert et al. (IZA, 2022), Khachiyan et al. (AER:I, 2022), Sherman et al. (NBER, 2023)
  - Mueller et al. (PNAS, 2021) use ML to measure was destruction from satellite images
  - Ratledge et al. (Nature, 2022) use satellite data and ML to both measure welfare and estimate the causal effect of electrification
Politics and Policy
- Battaglini et al. (NBER, 2022) use ML in Italy to improve detection of tax evasion
- Ash et al. (AEJ:Policy, 2025): use ML to predict corruption in Brazilian municipalities. See also de Blasio et al. (TechFore, 2022) in Italy.
- Gentzkov et al. (Econometrica, 2019) use ML to measure trends in the partisanship of U.S. congressional speeches
- Yeomans et al. (BDM, 2019) compare computer recommender systems to human recommenders in predicting which jokes people will find funny, and whether people are willing to rely on computer recommender systems
- Kleinberg et al. (NBER, 2019) discuss the potential advantages of ML in increasing equity
- Bertrand and Kamenica (NBER, 2018) use ML to measure culture distance between groups in the US
- Bonica (AJPS, 2018) infers roll‐call scores from campaign contributions using ML
- Hauser (Wharton, 2018) on combining ML with behavioral economics to reduce cheating
- Celiku and Kraay (World Bank, 2017); Musumba et al. (Sustainability, 2021); Bazzi et al. (ReStat, 2022) use ML to predict conflict
- See also MIT Professor Kim's "Machine Learning and Data Science in Politics" syllabus
Health
- Lee and Lee (EmpiricalEconomics, 2025) use ML to identify high-risk groups for elderly suicid
- Buyalskaya et al. (PNAS, 2023) use ML to predict habit formation in gym attendance and hand washing
- Shekhar et al. (NBER, 2023) use unsupervised ML to detect fraud and overbilling among US hospitals
- Daysal et al. (SSRN, 2022) disuss when it is welfare-enhancing to use ML for cancer screening
- Baird et al. (JDE, 2022) use LASSO to estimate the relationship between psychological trauma among Syrian refugee children and digitally coded features of their drawing
- Carrieri et al. (HealthEconomics, 2021) use ML to predict communities at a high risk of vaccine hesitancy
- Weis and Jacobson (Nature Biotechnology, 2021) build algorithm to identify future high-impact biotech publications. See also summary on MIT News
- Benetos et al. (IZA, 2021) apply ML to different audio features embedded within chart-topping songs to create an index correlated with survey-based life satisfaction
- Grogger et al. (NBER, 2020) use ML to predict domestic abuse cases
- Johnson et al (SSRN, 2020) use ML to target inspections to increase occupational safety and decrease injuries in the workplace
- Mullainathan and Obermeyer (NBER, 2020; QJE, 2022) use ML to reduce over- and under-testing for heart attacks
- Deryugina et al. (AER, 2019) use ML to estimate the life-years lost due to pollution exposure, plus treatment effect heterogeneity
- Obermeyer et al. (Science, 2019) find evidence of racial bias in one popular health algorithm due to biased historical heath training data. Summary and general discussion by Benjamin (Science, 2019). Also nice NYT article by Mullainathan.
- Hastings et al. (NBER, 2019; PNAS, 2020) use ML to predict the risk of future opioid dependence, abuse, or poisoning
- Ribers and Ullrich (ArXiv, 2019) show to what extent ML predictions may improve antibiotic prescribing by predict diagnostic test outcomes for urinary tract infections, while also highlighting that physician expert knowledge is still necessary
- Kleinberg et al. (AER P&P, 2015) apply ML to predict mortality risk in surgery
- What to learn from past mistakes in Google Flu (Lazer et al., Science 2014)
Macro, Business and Finance
- Baier and Regmi (OER, 2024) use ML to capture heterogeneity in free trade agreements
- Titl et al. (OBES, 2023) use ML to identify politically connected firms
- Clithero et al. (AEJ:Micro, 2023) use ML to estimate willingness-to-pay
- Babii et al. (RePEc, 2023) review literature on ML for economic forecasting
- Cafarella et al. (NBER, 2023) use ML to correct for quality change when measuring inflation
- Hochberg et al. (NBER, 2023) use text analysis to show that patents authored by female inventors are under-cited
- Lommers et al. (ArXiv, 2021) discuss ML applications in finance. See also Hoang and Wiegratz (RePec, 2023)
  - Na and Kim (EconomicsLetters, 2021) predict stock prices with ML using informed traders' activities
- Kaniel et al. (NBER, 2022): a ML model that includes interaction effects between investor sentiment, fund flows, and fund momentum has substantial power to predict the best- and worst-performing mutual fund
- Naudé et al. (IZA, 2021) discuss a ML competition in Africa
- Borgschulte et al. (NBER, 2021) use neural-network to assess signs of aging in pictures of CEOs in order to then estimate how exposure to a distress shock during the Great Recession affected CEOs’ apparent age
- Fuster et al. (JFinance, 2020): when applied to the US mortgage market, ML slightly increase credit provision overall, but also increase rate disparities.
  - Meursault et al. (JPAM, 2025) propose setting different lending thresholds in ML for low- and moderate-income neighborhoods
- Farrell et al. (AEA P&P, 2020) apply ML in administrative banking data to estimate gross family income
- Dumitrescu et al. (RePec, 2020): new, simple and interpretable credit scoring method which uses information from decision trees to improve the performance of logistic regression
- Björkegren and Grissen (WBER, 2019) use mobile phone data to predict credit repayment
- McKenzie and Sansone (JDE, 2019) use ML to predict successful entrepreneurs in a business plan competition. See also Coad and Srhoj (SBE, 2020), Bryan et al. (NBER, 2021)
- Andini et al. (VoxEU, 2018) on ML-based targeting in policies aiming at increasing household consumption and access to credit by firms
- "Algorithms Need Managers, Too" by Luca et al. (HBR, 2016) on the advantages and limitations of using algorithms in business. The gains from ML come from "identifying patterns too subtle to be detected by human observation, and using those patterns to generate accurate insights and inform better decision making"
Other
- Christensen et al. (NBER, 2022) use ML to predict building retrofit impact
- Fudenberg et al. (AER, 2019) use machine learning to uncover regularities in the initial play of matrix games

Please email me if you think I am missing some interesting (published) papers.

Google Sites

Report abuse