Before starting, install Python (from version 2.7 onwards), and the Python packages scikit-learn, numpy, and pandas. If you want suggestion on how to install Python and its packages look here.
* Install the Stata ML command
. ssc install r_ml_stata_cv
* Look at the documentation
. help r_ml_stata_cv
* Load intial dataset
sysuse boston, clear
* Form the train and test datasets
get_train_test , dataname("boston") split(0.80 0.20) split_var(svar) rseed(101)
* Form the target and the features
global y "medv"
global X "zn indus chas nox rm age dis rad tax ptratio black lstat"
* Run tree regression in default mode
. use boston_train, clear
. r_ml_stata_cv $y $X , mlmodel("tree") data_test("boston_test") default prediction("pred") seed(10)
* Run tree regression with specific tree depth
. cap rm CV.dta
. use boston_train, clear
. r_ml_stata_cv $y $X , mlmodel("tree") data_test("boston_test") prediction("pred") tree_depth(3) /// cross_validation("CV") n_folds(5) seed(10)
* Run tree regression with cross-validated tree depth
. cap rm CV.dta
. use boston_train, clear
. r_ml_stata_cv $y $X , mlmodel("tree") data_test("boston_test") prediction("pred") ///
tree_depth(1 2 3 4 5 6 7 8 9) cross_validation("CV") n_folds(5) seed(10) graph_cv
Cerulli, G. 2020. C_ML_STATA: Stata module to implement machine learning classification in Stata. Statistical Software Components, Boston College Department of Economics. Available at: https://econpapers.repec.org/software/bocbocode/s458830.htm.
Cerulli, G. 2020. R_ML_STATA: Stata module to implement machine learning regression in Stata. Statistical Software Components, Boston College Department of Economics. Available at: https://econpapers.repec.org/software/bocbocode/s458831.htm.
Cerulli, G. 2020. A super-learning machine for predicting economic outcomes, MPRA Paper 99111, University Library of Munich, Germany, 2020.
Gareth, J., Witten, D., Hastie, D.T., Tibshirani, R. 2013. An Introduction to Statistical Learning : with Application in R. New York, Springer.
Raschka, S., Mirjalili, V. 2019. Python Machine Learning. 3rd Edition, Packt Publishing.
Cerulli, G. (2020). "Machine learning using stata". Available at: https://sites.google.com/view/giovannicerulli/machine-learning-in-stata