(The above is a link to a Google Drive folder with all chapters and materials)
Citation: Rogers, Jonathan (in progress) Practical Social Science Research in R
Version 0.1 posted 18 August 2022 (unfinished DRAFT)
Introduction
A few years ago, I created an open access Data Analysis textbook for Stata users. I did this in Stata first, because at the time most of the questions I was getting were coming from Stata users. Over time that has changed, largely based on what courses students are taking and which statistical tools their professors require. Now, most of the questions I get come from R users, but the nature of the questions has been mostly the same. I maintain that the reason for this is that while courses evolve over time, the disconnect between what tends to be taught in classes and what new researchers need when they start off on their own projects remains. Reiterating the problems I listed three years ago in my earlier text :
There's only so much you can cover in one course. It's common for social science students to only be required to take one statistics-type course and for much of that course to be focused on the fundamentals of statistics. This may be reasonable enough, as most students won't then go on to carry out quantitative research, but if they do end up later needing more advanced tools, they won't have easy access to them.
Plenty of theory, not enough practice. Students may take multiple research methods classes, but end up with little actual experience in running code themselves.
That's a nice regression model, but how did you get there? Most courses focus on different types of models and tests, but leave students to fend for themselves when it comes to coding, re-coding, merging, cleaning, and other mundane data tasks. These are things every researcher needs to do, but that relatively few courses spend significant time talking about.
Textbook data is too clean and obvious. In textbooks, datasets tend to show up, as if by magic, at exactly the point where the researcher needs to apply a particular tool and it's obvious that the tool at hand is the correct one. The textbook shows the tool and it ends up being better than the tools the student had previously seen. Unfortunately, that's not how research tends to work. How do you make judgement calls when two or more different tools might seem appropriate? How do you deal with missing data or less than ideal measures?
Students need to see the full process. The first time many students see the full process of coding, cleaning, analysis, and presentation, it's when they're doing it themselves. How do you present a table of regression results? What makes a graph/figure good or bad? We can't really expect our students to do either of these well in papers or presentations if we don't show them how to do it first.