1. Intro to Stata
“First, solve the problem. Then, write the code.” – John Johnson
Lesson Prerequisites
This lesson assumes that you have completed an introduction to Stata tutorial, like this one by G. Rodriguez. It also assumes that you have a working copy of Stata 15.0 or later on your machine. If you have an earlier version of Stata, all of the commands in this lesson will work, though the user interface may look slightly different. (The only command in this bootcamp that won't work with earlier versions of Stata is the power command with the cluster option, which you will encounter in the lesson on Power Calculations. In that case you will have the option to use the user-written command clustersampsi.)
Intro to the Lesson
During the lesson, follow along with the videos by loading the dataset and attempting every command on your own console. Pause the video and troubleshoot commands as needed.
Lesson Agenda and Dataset
In this lesson we'll discuss the Stata user interface, the structure of Stata commands, and how to save and run your code from a .do file.
1. The Stata Interface (1): Stata Windows
Stata's interface is made up of various windows that can be moved around or closed to optimize the user's experience.
2. The Stata Interface (2): Browse, List, Codebook
The browse command allows you to see the data loaded into Stata. codebook, compact is a great command to run when you open a new dataset, since it provides a nice overview of what is contained in the data.
3. Stata Syntax (1): Summarize, Help
You can abbreviate commands in Stata to save time. Type help [command] in the Command Window to display the help file for any command.
4. Stata Syntax (2): Tabulate
The tabulate command shows the one-way frequency of a variable, or the cross-tabulation of two variables. Run numlabel, add to add numeric labels as prefixes to value labels, which makes the output from tabulate easier to interpret.
5. Conditional Statements
Use if qualifiers within commands to execute the command on a subset of data. But watch out for missing values! Typically you will want to include an additional condition, if [varname] != .
6. Generating Variables
Use generate to create new variables and replace to replace the contents of existing variables. sort allows you to sort the dataset on a variable or a list of variables, but be sure to include the stable option.
7. .do Files
Save your code in .do files so that you can easily update to your data pipeline and share your work with collaborators.
Additional Resources
Google!
Type help [command] in the Command Window to bring up the help file.
G. Rodriguez Stata Tutorial
J-PAL Stata Resources. Module 101 is great practice for the concepts covered in this lesson.
UCLA Institute for Digital Research and Education Stata modules
Excellent Stata cheat sheets. Here's the one for basic commands.
Banner photo: Charles Minard's map of Napoleon's 1812 Russia campaign, an early and stunning example of a Sankey diagram. Accessed from https://en.wikipedia.org/wiki/Charles_Joseph_Minard#/media/File:Minard.png