Search this site
Embedded Files
Jon Rogers
  • Home
    • CV and Papers
    • R Textbook
    • Stata Textbook
    • Photography and Travel Tips
Jon Rogers

Real Data, Real Problems: An Open-Access Guide to working with Messy, Annoying, and Otherwise "Real" Data in Stata

(The above is a link to a Google Drive folder with all chapters and materials)

Citation: Rogers, Jonathan (2019) Real Data, Real Problems: An Open-Access Guide to working with Messy, Annoying, and Otherwise "Real" Data in Stata.  

Version 2.1 posted 31 May 2023

Introduction

Based on the number of internal and external requests that I get for assistance from folks who are working with data in Stata, I decided to start from some of my teaching materials and build an open-access digital text.  I did this, because of a few problems I see with how research methods tend to be taught in the social sciences (political science, economics, sociology, etc):

  1. There's only so much you can cover in one course.  It's common for social science students to only be required to take one statistics-type course and for much of that course to be focused on the fundamentals of statistics.  This may be reasonable enough, as most students won't then go on to carry out quantitative research, but if they do end up later needing more advanced tools, they won't have easy access to them.

  2. Plenty of theory, not enough practice.  Students may take multiple research methods classes, but end up with little actual experience in running code themselves.

  3. That's a nice regression model, but how did you get there?  Most courses focus on different types of models and tests, but leave students to fend for themselves when it comes to coding, re-coding, merging, cleaning, and other mundane data tasks.  These are things every researcher needs to do, but that relatively few courses spend significant time talking about.

  4. Textbook data is too clean and obvious.  In textbooks, datasets tend to show up, as if by magic, at exactly the point where the researcher needs to apply a particular tool and it's obvious that the tool at hand is the correct one.  The textbook shows the tool and it ends up being better than the tools the student had previously seen.  Unfortunately, that's not how research tends to work.  How do you make judgement calls when two or more different tools might seem appropriate?  How do you deal with missing data or less than ideal measures?

  5. Students need to see the full process.  The first time many students see the full process of coding, cleaning, analysis, and presentation, it's when they're doing it themselves.  How do you present a table of regression results?  What makes a graph/figure good or bad? We can't really expect our students to do either of these well in papers or presentations if we don't show them how to do it first.

To address this, I've created a series of lessons (usually using "real" datasets) on each step of the data analysis process.  Each chapter is comprised of a Stata .do file, where I explain what I am doing and why, then I show the code and explain the output that Stata generates.  I wrote the original version of this in Stata 15.1, so you need Stata 14 or higher in order to be able to open the data and run the code (but you can at least open the .do file in any version of Stata).  I update this each Summer, as that is when I tend to have time to do so, adding new topics, editing old ones, and revising instructions based on changes/upgrades StataCorp makes to the available commands and options.  As of May 2023, I have updated this for Stata 18.  On occasion, you will find commands that only exist in a new version of Stata or where the required syntax changed between versions.  I have marked these.  Below, there is a link to the Google Drive where everything is stored.  There is a table of contents in a .txt file (readable in programs like Notepad or TextEdit) at the bottom of the page, as well as a history of changes I have made.

Link To All Chapters (click here) 



Google Sites
Report abuse
Google Sites
Report abuse