1. Intro to Stata

“First, solve the problem. Then, write the code.” – John Johnson

Lesson Prerequisites

This lesson assumes that you have completed an introduction to Stata tutorial, like this one by G. Rodriguez. It also assumes that you have a working copy of Stata 15.0 or later on your machine. If you have an earlier version of Stata, all of the commands in this lesson will work, though the user interface may look slightly different. (The only command in this bootcamp that won't work with earlier versions of Stata is the power command with the cluster option, which you will encounter in the lesson on Power Calculations. In that case you will have the option to use the user-written command clustersampsi.)

Intro to the Lesson

During the lesson, follow along with the videos by loading the dataset and attempting every command on your own console. Pause the video and troubleshoot commands as needed.

Lesson Agenda and Dataset

In this lesson we'll discuss the Stata user interface, the structure of Stata commands, and how to save and run your code from a .do file.

1. The Stata Interface (1): Stata Windows

Stata's interface is made up of various windows that can be moved around or closed to optimize the user's experience.

2. The Stata Interface (2): Browse, List, Codebook

The browse command allows you to see the data loaded into Stata. codebook, compact is a great command to run when you open a new dataset, since it provides a nice overview of what is contained in the data.

3. Stata Syntax (1): Summarize, Help

You can abbreviate commands in Stata to save time. Type help [command] in the Command Window to display the help file for any command.

4. Stata Syntax (2): Tabulate

The tabulate command shows the one-way frequency of a variable, or the cross-tabulation of two variables. Run numlabel, add to add numeric labels as prefixes to value labels, which makes the output from tabulate easier to interpret.

5. Conditional Statements

Use if qualifiers within commands to execute the command on a subset of data. But watch out for missing values! Typically you will want to include an additional condition, if [varname] != .

6. Generating Variables

Use generate to create new variables and replace to replace the contents of existing variables. sort allows you to sort the dataset on a variable or a list of variables, but be sure to include the stable option.

7. .do Files

Save your code in .do files so that you can easily update to your data pipeline and share your work with collaborators. 

Additional Resources

Banner photo: Charles Minard's map of Napoleon's 1812 Russia campaign, an early and stunning example of a Sankey diagram. Accessed from https://en.wikipedia.org/wiki/Charles_Joseph_Minard#/media/File:Minard.png