Policy Targeting > Machine Learning Methods for analyzing Public Policies
In the rapidly evolving world of economic policies, policymakers often face complex decisions with wide-reaching societal impacts. To support their decisions, it is becoming more and more common to employ machine learning, a branch of artificial intelligence that provides powerful and innovative tools to analyze vast amounts of data, uncover hidden trends, and make highly accurate predictions. Prediction is the process that provides the missing information. The prediction process applies machine learning methods on the available information (i.e., the data) to generate the information we do not have, which is the prediction or forecast of values we are not aware of and that are of interest to us.
The integration of machine learning tools in the policy making process is not just a possibility; it is a transformation that is already occurring. By providing economic policy experts with tools to analyze complex scenarios and anticipate the consequences of their decisions, machine learning plays a key role in crafting more effective and evidence-based policy strategies.
Before introducing the main machine learning methods, it is important to highlight that they are mainly divided into two categories: regression and classification. Regression is used to predict continuous values. For example, it can help forecast the number of inhabitants or unemployed of certain regions based on historical and current values of informative variables. Classification, on the other hand, is used to determine which category an item belongs to. An application might be determining whether a municipality is at risk of bankruptcy, a crucial distinction for guiding economic policy interventions. At this point, what are the main machine learning methods? We will focus on 3 groups of methods.
The first category of machine learning methods we present is used for regression purposes and descends from the standard linear model:
where Y is the dependent variable (response) and X1,..,Xk are the k variables (predictors or features) used to predict Y. This is the most commonly used model in statistics and econometrics, and its coefficients (betas) are commonly estimated via the ordinary least squares (OLS) estimator. However, when the number of variable (k) is large with respect to the number of observations (n), the OLS estimator does not perform well and other methods deliver more accurate estimates. In particular, it can be demonstrated that by constraining or shrinking the estimated coefficients, we can often substantially reduce the variance at the cost of a negligible increase in bias and this leads to substantial improvements in the accuracy of the estimates.
One of these contraction methods is called “Ridge”. Ridge Regression modifies the OLS objective function by adding a penalty proportional to the sum of the squares of the coefficients. This regularization technique makes the estimates more reliable than those obtained via OLS.
Another estimator similar to Ridge is called “LASSO” (Least Absolute Shrinkage and Selection Operator). LASSO takes regularization a step further by not only penalizing the size of the coefficients but also setting some coefficients to zero. This method effectively performs feature selection, retaining only the most significant variables in the final model. LASSO helps in enhancing model interpretability and reducing overfitting by eliminating unnecessary predictors in complex economic models.
Another related approach is named “Principal Component Regression” (PCR). PCR combines the ideas of principal component analysis (PCA) and multiple regression. PCR starts by transforming the original variables into a smaller set of uncorrelated components, which are the principal components. These components then serve as the predictors in a regression model estimated via OLS. PCR works particularly well when the predictors are highly correlated between each other’s.
Decision trees are a straightforward yet powerful tool in machine learning, often used for both classification tasks (where the goal is to categorize data into different groups) and regression tasks (where the goal is to predict a numerical value). These models are popular because they mimic human decision-making by breaking down data into smaller, more manageable subgroups based on specific features - much like how a person makes decisions by considering one factor at a time.
Imagine you are trying to decide where to go on vacation. A decision tree model would start by considering a broad question: “Do you want to relax or go on an adventure?” Based on your answer, it branches out to more specific questions, gradually narrowing down the options until a final decision is reached. In a similar way, a decision tree in machine learning splits data by asking yes/no questions about its features (like income levels, age, or spending habits) until it can make accurate predictions or classifications.
One of the major advantages of decision trees is their high interpretability. You can easily understand and visualize how decisions are made, which is crucial in fields like economics or healthcare where understanding the rationale behind a model’s prediction is as important as the prediction itself. This makes them very useful for tasks that require transparency and easy explanations of the decision process.
However, decision trees suffer from a relevant limitation: they tend to not deliver accurate predictions, particularly in complex scenarios where relationships between data points can be highly nonlinear. To overcome this issue, decision trees are often used as building blocks for more advanced ensemble methods like Bagging, Random Forests or Gradient Boosting. These techniques combine the outputs of many different decision trees to improve accuracy and robustness. Random Forests, for example, create a ‘forest’ of decision trees where each tree is trained on a random subset of the data and features, and the final prediction is made by averaging the predictions of all the trees. This method reduces the risk of overfitting and generally leads to better performance across a wide range of tasks.
In summary, while decision trees alone might not always deliver the most accurate predictions, their simplicity, interpretability, and the role they play in more complex models make them invaluable tools in the toolbox of machine learning, particularly in applications where understanding the model’s reasoning is crucial.
Neural networks are a fascinating and powerful class of machine learning models that mimic the architecture of the human brain. Just as neurons in the brain connect and transmit signals to perform complex tasks, neural networks use interconnected nodes (or “neurons”) to process data and make predictions. This design makes them exceptionally good at handling tasks involving complex and unstructured data such as images, text, and time series data.
One of the greatest strengths of neural networks is their ability to identify intricate patterns in large datasets that might be invisible to more traditional statistical methods. For example, in image recognition, neural networks can learn to identify objects in images with a level of accuracy that rivals human performance. In natural language processing, they can understand and generate human-like text based on the context of previous conversations or documents.
However, to achieve these impressive feats, neural networks require substantial amounts of data. This is because they learn directly from the data, adjusting the strengths of connections between neurons in ways that reduce errors in predictions. With enough data, neural networks can often outperform nearly all other methods, delivering highly accurate and nuanced predictions.
Despite these advantages, neural networks come with a significant drawback: they are often seen as “black boxes”. This term refers to the difficulty of understanding exactly how they work internally. When a neural network makes a decision, it is challenging to trace back and pinpoint which inputs most influenced the output, and how they did so. This is largely due to their deep and complex structures, where information is processed in multiple layers, each transforming the input in various ways before passing it on to the next.
The black-box nature of neural networks poses challenges in fields where transparency and accountability are essential. For instance, in finance, healthcare, and law, knowing why a model made a specific decision can be as important as the decision itself. Despite these challenges, the use of neural networks continues to grow because their benefits often outweigh the drawbacks, especially in applications where complex data patterns need to be deciphered and high accuracy is critical. However, the ongoing development of methods to enhance the interpretability of neural networks will be crucial to their broader adoption in sensitive and impactful areas.
Through the responsible and informed use of these tools, policymakers can significantly improve the precision of their analyses and the effectiveness of their initiatives, leading to tangible benefits for the economy and society as a whole.
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibsharani (2021), "An Introduction to Statistical Learning: with Applications in R", second edition, Springer.
Athey, Susan, and Guido W. Imbens. "Machine learning methods that economists should know about." Annual Review of Economics 11 (2019): 685-725.
Athey, Susan. "Beyond prediction: Using big data for policy problems." Science 355.6324 (2017): 483-485.
Mullainathan, Sendhil, and Jann Spiess. "Machine learning: an applied econometric approach." Journal of Economic Perspectives 31.2 (2017): 87-106.
Zhao, Qingyuan, and Trevor Hastie. "Causal interpretations of black-box models." Journal of Business & Economic Statistics39.1 (2021): 272-281.