2016 CIdE/SIdE Summer School of Econometrics
This year the Summer School of Econometrics will be two-week long.
NB: Prof. Lawrence Carin from Duke coud not make it and Prof. Anima Anandkumar kindly accepted to teach the Machine Learning module.
First week topic: Big Data Econometrics and Machine Learning
Speakers:
Dates: July 4 through July 8, 2016
Syllabus: Big Data Econometrics and Machine Learning (Harding and Anandkumar)
(download new syllabus)
Part I: Big Data Econometrics
1. Introduction to Big Data
a. Sources of Big Data and Definitions; What is the Value of Big Data?
b. Challenges – p-hacking and statistical inference
c. Trade-offs between model complexity and model accuracy
d. Example: RCTs and the Smart Grid
2. Estimation of high-dimensional models
a. Selecting models in high dimensions
b. Lasso, Ridge, Elastic Net
c. Heterogeneity in High Dimensional Models – Penalized quantile regression
d. Example: penalized forecasting
3. Discrete choice modeling
a. Using machine learning for classification
b. Bayesian Parametrics and Non-parametrics
c. Dirichlet Processes
d. Example: Using scanner data for micro and macro research
4. Supervised learning and Unsupervised learning
a. Building decision trees
b. Random forests
c. Basics of neural networks and deep learning
d. Example: satellite imagery; should policy makers be replaced by algorithms?
5. Causal Inference in Big Data
a. Prediction and Estimation: Lasso for Causal Inference
b. Random forests for causal inference
c. Using machine learning to construct counterfactuals
d. Example: Using synthetic controls in time series analysis
Hands-on tutorials will be in R and will be discussed each day.
Books
Paarsch, H.J. and Golyaev, K., 2016. A Gentle Introduction to Effective Computing in Quantitative Research: What Every Research Assistant Should Know. Mit Press.
James, G., Witten, D., Hastie, T. and Tibshirani, R., 2013. An introduction to statistical learning (Vol. 112). New York: springer.
Abu-Mostafa, Y.S., Magdon-Ismail, M. and Lin, H.T., 2012. Learning from data. Berlin, Germany: AMLBook.
Müller, P., Quintana, F.A., Jara, A. and Hanson, T., 2015. Bayesian nonparametric data analysis. Springer.
Papers
Bajari, P., Nekipelov, D., Ryan, S.P. and Yang, M., 2015. Machine learning methods for demand estimation. The American Economic Review, 105(5), pp.481-485.
Belloni, A., Chernozhukov, V. and Hansen, C., 2014. High-dimensional methods and inference on structural and treatment effects. The Journal of Economic Perspectives, 28(2), pp.29-50.
Fan, J., Han, F. and Liu, H., 2014. Challenges of big data analysis. National science review, 1(2), pp.293-314.
Fan, J. and Liao, Y., 2014. Endogeneity in high dimensions. Annals of statistics, 42(3), p.872.
Harding, M. and Lamarche, C. 2016. Sparsity-Based Estimation of a Panel Quantile Count Data Model with Applications to Big Data
Ng, S., 2015. Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data.
Schmidhuber, J., 2015. Deep learning in neural networks: An overview. Neural Networks, 61, pp.85-117.
Varian, H.R., 2014. Big data: New tricks for econometrics. The Journal of Economic Perspectives, 28(2), pp.3-27.
Wager, S. and Athey, S., 2015. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. arXiv preprint arXiv:1510.04342.
Part II: Machine Learning (Anima Anandkumar)
Overview of learning in high dimensions.
High dimensional regime: no. of variables >> no. of samples.
Efficient learning algorithms: tradeoffs in computational & sample complexity and learning accuracy
Basic dimensionality reduction methods, e.g. PCA, PLS.
Clustering.
Optimization
Demonstrate how learning problems require optimization.
Contrast convex vs. non-convex programs.
KKT conditions and duality.
Analysis of Gradient descent and Newton’s method for convex programs.
Application to Lasso.
Unsupervised learning
Latent factor models.
Method of moment estimators.
Solving for tensor decompositions.
Applications: topic models, recommender systems, mining communities from social network data, finding embeddings for understanding text.
Supervised learning
Demo and hands-on experience using tensor flow.
Neural networks as universal approximators.
Approximation error in fitting functions to a neural network.
Deep neural networks: expressivity properties.
Training neural networks: issues in nonconvex optimization. E.g. saddle points.
Sequence Learning
Modeling sequences through Markov and hidden Markov models.
Prediction in HMMs through matrix methods
Learning hidden Markov models through tensor methods.
Reinforcement learning through Markov decision processes and partially observed Markov decision processes.
Using tensor methods for RL.
References (lecture 1)
References (lecture 2)
Convex optimization by Stephen Boyd.
Stochastic Optimization in High Dimension" PhD thesis by Hanie Sedghi. Link
References (lecture 3)
"Non-convex Optimization in Machine Learning: Provable Guarantees Using Tensor Methods" By Majid Janzamin. PhD thesis. Link
Blog post on saddle points. Link
On the Link Between Gaussian Homotopy Continuation and Convex Envelopes by Hossein Mobahi, John W. Fisher III, Energy Minimization Method in Computer Vision and Pattern Recognition (EMMCVPR 2015). Link
Coarse-to-Fine Minimization of Some Common Nonconvexities by Hossein Mobahi, John W. Fisher III, Energy Minimization Method in Computer Vision and Pattern Recognition (EMMCVPR 2015). Link
References (lecture 4)
"Discovery of Latent Factors in High-dimensional Data Using Tensor Methods" By Furong Huang. PhD thesis. Link
"Tensor Decompositions for Learning Latent Variable Models" by A. Anandkumar, R. Ge, D. Hsu, S.M. Kakade and M. Telgarsky. Journal of Machine Learning Research 15 (2014) 2773-2832. Link
"Representation learning: A review and new perspectives" by Y Bengio, A Courville, P Vincent. IEEE transactions on pattern recognition. 2013. Link
"Learning Sparsely Used Overcomplete Dictionaries" by A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, R. Tandon,Proc. of COLT. 2014. Link
"Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition" By Furong Huang, A. Anandkumar. 2016. Link
References (lecture 5)
"Universal Approximation Bounds for Superpositions of a Sigmoidal Function " by Andrew R. Barron. IEEE Transactions on Information Theory, Vol. 39, No.3, May 1993. Link
"Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods" By Majid Janzamin, Hanie Sedghi, Anima Anandkumar. June 2015. Link
"Reinforcement Learning of POMDPs using Spectral Methods"
By Kamyar Azizzadenesheli, Alessandro Lazaric, Anima Anandkumar
Conference on Learning Theory (COLT), New-York City, USA, June 2016. Link
List of visualizations/demos
Least squares:
http://www.dangoldstein.com/regression.html
PCA
http://setosa.io/ev/principal-component-analysis
K-Means clustering
http://shiny.rstudio.com/gallery/kmeans-example.html
Eigenvectors and eigenvalues
http://setosa.io/ev/eigenvectors-and-eigenvalues/
Topic modeling demo: NYtimes dataset
http://newport.eecs.uci.edu/anandkumar/Lab/Lab_sub/NewYorkTimes3.html
Foreground/background separation using robust PCA
http://newport.eecs.uci.edu/anandkumar/Lab/Lab_sub/ncrpca.html
Structure learning in latent tree models
http://newport.eecs.uci.edu/anandkumar/Lab/Lab_sub/Projects_sub/CLTM/dynamicTree.html
Tensorflow playground:
http://playground.tensorflow.org/
Second week topic: Methods for Evaluating Social Programs and for Duration Data Analysis
Speakers:
Dates: July 11 through July 15, 2016
Syllabus: Methods for Evaluating Social Programs (Todd)
(download syllabus)
Course Description:
This course will examine econometric methods for evaluating effects of social program interventions. Typical interventions that might be of interest include job training or other active labor market programs, education programs (such as school subsidy programs), or health programs.
The first part of the course will examine ex post evaluation methods that are applicable after the program has been implemented and data are available on persons who participated in the program and possibly also on a group of people who did not participate in the program. We consider both the case where the program was randomly assigned and when assignment was not random. We will examines methods that include regression estimators, matching estimators, control function estimators, regression discontinuity (RD) methods, IV and LATE estimators, and bounding methods.
The second part of the course will consider methods for ex ante evaluation, that is, methods for evaluating programs that do not yet exist or for evaluating alternative versions of existing programs. These methods typically make more extensive use of structural models.
Course notes will be available on the website:
http://athena.sas.upenn.edu/~petra/bgpe.htm
Readings:
The main references are class notes and the following book chapters:
Heckman, James J., Lalonde, Robert J. and Smith, James A. (1999): “The Economics and Econometrics of Active Labor Market Programs” in Handbook of Labor Economics, Volume 3A, eds. Orley C. Ashenfelter and David Card.
Todd, Petra E. (2005): “Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated,” draft of chapter under preparation for Handbook of Development Economics, downloadable from http://athena.sas.upenn.edu/~petra/papers/hae.pdf
“The Structural Estimation of Behavioral Models: Discrete Choice Dynamic Programming Methods and Applications,” with Kenneth I. Wolpin and Michael Keane, 2010, Handbook of Labor Economics , ed. David Card and Orley Ashenfelter, Volume 2, Elsevier, p. 332-461.
(1) Ex Post Evaluation Methods
Abadie, Alberto and Guido Imbens (2006): "Large Sample Properties of Matching Estimators for Average Treatment Effects," Econometrica, 74, 1, 235-267.
Andrews, Donald and Schafgans, (1998): "Semiparametric Estimation of the Intercept of a Sample Selection Model," Review of Economic Studies, 65, 497-518.
Angrist, J. and Imbens, G. "Identification and Estimation of Local Average Treatment Effects" in Econometrica, March, 1994, v62, n2, p467(9).
Ashenfelter, Orley (1978): “Estimating the Effect of Training Programs on Earnings” in Review of Economics and Statistics, 60, 47-57.
Ashenfelter, Orley and David Card (1985): “Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs,'' in Review of Economics and Statistics, 67, 648-660.
Behrman, Jere, Jorge Garcia-Gallardo, Susan Parker, Petra Todd, and Viviana Velez-Grajales (2005): "How Conditional Cash Transfers Impact School and Working Behavior of Children and Youth in Urban Mexico," Education Economics.
Carniero, P., Heckman, J. J. and Vyltacil, E (2001): "Understanding What Instrumental Variables Estimate: Estimating Marginal and Average Returns to Education," manuscript, University of Chicago.
Dehejia, Rajeev and Sadek Wahba (1998): “Propensity Score Matching Methods for Nonexperimental Causal Studies,'' NBER Working Paper No. 6829.
Dehejia, Rajeev and Sadek Wahba (1999): “Causal Effects in Noexperimental Studies: Reevaluating the Evaluation of Training Programs,” in Journal of the American Statistical Association, 94(448), 1053-1062.
Duflo, Esther (2000): "Child Health and Household Resources in South Africa: Evidence from Old Age Pension," AEA Papers and Proceedings, 90(2), 393-398.
Galiani, Sebastian, Gertler, Paul, and Ernesto Schargrodsky "Water for Life: The Impact of the Privatization of Water Services on Child Mortality in Argentina," Journal of Political Economy, Vol. 113, No. 1 (February 2005), pp. 83-120.
Glewwe, Paul, Kremer, Michael, Moulin, Sylvie, and Eric Zitzewitz (2004): "Retrospective vs. prospective analyses of school inputs: the case of flip charts in Kenya," Journal of Development Economics, 74, 251-268.
Hahn, J., Todd, P. and W. Van der Klauww (2001): ‘‘Identification of Treatment Effects by Regression-Discontinuity Design,” in Econometrica, February, 2001.
Heckman, James (1997): ``Randomization as an Instrumental Variables Estimator: A Study of Implicit Behavioral Assumptions in One Widely-used Estimator,'' Journal of Human Resources, 32, 442-462.
Heckman, J., H. Ichimura, J. Smith and P. Todd (1998): “Characterizing Selection Bias using Experimental Data” Econometrica, Vol. 66, September.
Heckman, J., H. Ichimura and P. Todd (1997): “Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Program” with J. Heckman and H. Ichimura, Review of Economic Studies, Vol. 64(4), October.
Heckman, James and Salvador Navarro (2004): ``Using Matching, Instrumental Variables, and Control Functions to Estimate Economic Choice Models," Review of Economics and Statistics, February 2004, Vol. 86, No. 1, Pages 30-57.
Heckman, J. and E. Vytlacil (2005): "Structual Equations, Treatment Effects and Econometric Policy Evaluation," Econometrica, 2005.
Imbens, Guido W. (2009): ``Better LATE than nothing: some comments on Deaton (2009) and Heckman and Urzua (2009)," NBER working paper #14896.
Imbens, Guido W. and Thomas Limieux (2008): “Regression Discontinuity Designs: A Guide to Practice,” Journal of Econometrics, Volume 142, Issue 2, p. 615-635.
Imbens, Guido W. and Jeffrey Wooldridge (2008): “Recent developments in the econometrics of program evaluation,” NBER working paper #14251.
LaLonde, Robert (1986): “Evaluating the Econometric Evaluations of Training Programs with Experimental Data” in American Economic Review, 76, 604-620.
Rosenbaum, P. and D. Rubin (1983): "The Central Role of the Propensity Score in Observational Studies for Causal Effects," Biometrika, 70,41-55.
Rubin, D. B. (1980): "Bias Reduction Using Mahalanobis' Metric Matching," Biometrics, 36,2, pp. 295-298.
Smith, J. and P. Todd “Reconciling Conflicting Evidence on the Performance of Propensity Score Matching Estimators” in American Economic Review, Papers and Proceedings, May 2001.
Thistlethwaite, D., and D. Campbell (1960) :”Regression-discontinuity Analysis: An alternative to the ex post facto experiment”, Journal of Educational Psychology, 51, 309-317.
Trochim, W. (1984): Research Design for Program Evaluation: the RegressionDiscontinuity Approach}. Beverly Hills: Sage Publications.
Van der Klaauw, W. (1996): “Estimating the Effect of Financial Aid Offers on College Enrollment,” in International Economic Review, Vol. 43, Issue 4, pp. 1249-1287.
Wolpin, Kenneth I. and Mark R. Rosenzweig (1988a): "Evaluating the Effects of Optimally Distributed Programs: Child Health and Family Planning Programs," in American Economic Review, 76(3), 470-482.
(2) Ex Ante Evaluation Methods
Heckman, James J. (2000): "Causal Parameters and Policy Analysis in Economics: A Twentieth Century Retrospective," in Quarterly Journal of Economics, Vol. 115(1), p.45-97.
Hurwicz, Leonid (1962): "On the Structural Form of Interdependent Systems." In Logic, Methodology and Philosophy of Science, edited by Ernest Nagel, Pattrick Suppes and Alfred Tarski. Stanford, Calif.: Stanford University Press.
Ichimura, Hidehiko and Christopher Taber (2002): "Semiparametric Reduced-Form Estimation of Tuition Subsidies" in American Economic Review. Vol. 92 (2). p 286-92.
Lise, Jeremy, Seitz, Shannon, and Jeffrey Smith (2003): "Equilibrium Policy Experiments and the Evaluation of Social Programs," working paper.
Lumsdaine, Robin L., James H. Stock and David A.Wise (1992): "Pension Plan Provisions and Retirement: Men and Women, Medicare, and Models," in D. A. Wise (ed.) Studies in the Economics of Aging, Chicago: University of Chicago Press.
Marschak, Jacob (1953): "Economic Measurements for Policy and Prediction," in William Hood and Tjalling Koopmans, eds., Studies in Econometric Method (New York: John Wiley, 1953), pp. 1-26.
McFadden, Daniel and A. P. Talvitie and Associates (1977): "Validation of Disaggregate Travel Demand Models: Some Tests" in Urban Demand Forecasting Project, Final Report, Volume V, Institute of Transportation Studies, University of California, Berkeley.
Todd, Petra E. and Kenneth I. Wolpin (2005): “Ex Ante Evaluation of Social Programs,” Annales Statistique, 2008, downloadable from http://athena.sas.upenn.edu/~petra/papers/exante.pdf
Todd, Petra E. and Kenneth I. Wolpin (2006): “Assessing the Impact of a School Subsidy Program in Mexico: Using Experimental Data to Validate a Behavioral Model of Child Schooling and Fertility,” American Economic Review, 2006, 96(5): 1384–1417. downloadable from http://athena.sas.upenn.edu/~petra/papers/aerfinal.pdf
Todd, Petra E. and Kenneth I. Wolpin (2006): “Handout on Ex Ante Evaluation in a Three Period Schooling Choice Model,” downloadable from http://athena.sas.upenn.edu/~petra/iza/exanteexample.pdf
Todd, Petra E. and Kenneth I. Wolpin (2010): “Structural Estimation and Policy Evaluation in Developing Countries,” published in Annual Review of Economics, downloadable from http://athena.sas.upenn.edu/~petra/papers/arregstyle1.pdf
Wise, David A. (1985): "A Behavioral Model Verses Experimentation: The Effects of Housing Subsidies on Rent" in Methods of Operations Research, 50, Verlag Anton Hain.
Some references on nonparametric Methods
Fan, J. "Design Adaptive Nonparametric Regression," in JASA, 87, 998-1004.
Fan, J. "Local Linear Regression Smoothers and their Minimax Efficiencies," The Annals of Statistics, 21, 196-216.
Hardle, W. Applied Nonparametric Regression, Cambridge University Press.
Hardle, W. and Linton, O. "Applied Nonparametric Methods" in Handbook of Econometrics, (R. Engle and D. McFadden, eds.), Vol. IV, 1994, p.2295.
Ichimura, Hidehiko and Todd, Petra (2000) "Implementing Nonparametric and Semiparametric Estimators," manuscript under preparation for Handbook of Econometrics, Volume 5. (downloadable from http://athena.sas.upenn.edu/~petra/papers/curver9.pdf
Jones, M. C., Marron, J. S. and Sheather, S. J. (1996): "A Brief Survey of Bandwidth Selection for Density Estimation" in Journal of the American Statistical Association, Vol. 91, No. 433, 401-407.
Syllabus: Methods for Duration Data Analysis (Lindeboom)
(download syllabus)
AIM: The aim of this course is to provide the students with methods for modelling of duration data. Emphasis is given to empirical applications using micro data. The course, therefore, also includes practical computer assignments using Stata.
PREREQUISITES: Students are assumed to be familiar with the basic concepts econometrics, such as linear regression, instrumental variables, panel data, logit/probit models and hypothesis testing. Preferably, students have followed an introductory course in econometrics at the graduate level, but students who completed a course in statistics or advanced research methods are also considered. Students should be familiar with a statistical package, preferably Stata.
FORMAT: The course contains 3 classroom lectures and two computer lab sessions. During these computer lab sessions students have to work with Stata to make an empirical assignment.
COURSE MATERIAL:
Cameron, A.C. and P.K. Trivedi (2004), Applied Microeconometrics, Cambridge University Press, Chapters 17-19.
Kiefer, N.M. (1988), Economic duration data and hazard functions, Journal of Economic Literature 26, 646--679.
Lancaster, T. (1990), The econometric analysis of transition data, Cambridge University Press, selected chapters.
Van den Berg, G.J. (2001), Duration models: specification, identification, and multiple duration, in J.J. Heckman and E.E. Leamer (eds.), Handbook of Econometrics, Volume 5, North-Holland, Amsterdam.
During the course lecture slides will be distributed.
Lecture 1: Introduction to duration models
· Concepts
· Non-parametric methods
· Parametric models
Lecture 2: More on parametric and semi-parametric models and unobserved heterogeneity
· Time varying covariates and piecewise constant specification of the baseline hazard
· Unobserved heterogeneity in duration models
· Partial Likelihood (Cox model)
Lecture 3: Multiple spells and multivariate duration models
· Fixed effects duration models
· Competing risk models
· Multiple spells multi-state models
Lecture 4 and 5: Computer lab sessions.
Please bring your own laptop and consider that the lab sessions will be in Stata.
Organizer (on behalf of CIdE/SIDE): Juri Marcucci
Venue: Bank of Italy Sadiba Center, Perugia, Italy
Some useful links:
If you have any further questions, please email me at juri (dot) marcucci (at) bancaditalia (dot) it