Each student is responsible for two projects: one for the course in Data Analysis, and one for the course in Quantitative Methods proper. Since both of them need the same software, R, and follow the same general rules, I treat them jointly.
To do the project, you'll need to install both R, and R Studio. Also, you'll need a set of libraries. R is an "open source program" and might be downloaded and used freely. A free version of R Studio is also available.
Install R and R Studio
You will have to install first R, then, R Studio. For the Linux operating system, see these instructions. For other operating systems, search for instructions in the Internet.
Install libraries
From RStudio, you will have to install packages that we will need. As indicated below for a package called "tidyverse", for example:
install.packages("tidyverse")
As a reference to learn R we use this text:
Roger D. Peng. R Programming for Data Science. It can be obtained here. Please also check Peng's course on "Coursera".
There are many other freely available resources online to learn R. Some of them we will use in class. Here are some examples.
Grolemund, Garrett and Wickham, Hadley (2017). "R for Data Science". O'Reilly. Online: https://r4ds.had.co.nz/
"RStudio Blog": https://blog.rstudio.com/
Hanck, Christoph; Arnold, Martin; Gerber Alexander, and Martin Schmelzer Introduction to Econometrics with R. Online: https://www.econometrics-with-r.org/
Sundstrom, Williams. Guide to R for SCU Economics Students (contains replication of some of the Stock and Watson textbook) http://rpubs.com/wsundstrom/home
As indicated in the syllabuses, we will also use some interactive "Shiny Apps". These are a cool way to learn important concepts, and they are also "powered by R". However, we will not have to worry about the details of how they work. But if you are curious, you might want to have a look at this page .
We share a Dropbox folder. Copy what you need locally (i.e., in your computer). Please make a serious effort NOT to erase any file from the Dropbox folder!
Install Dropbox in your computer, to facilitate sharing your projects file.
Hopefully, by the end of the course not only will you have learned a set of techniques to analyze quantitative information, but also you will be permeated by the light of the following Fundamental Truth: data analysis is a type of craftsmanship, and to do it well you need to follow a methodology. Such methodology is aimed at the following goals: we want our results to be replicable (by other researchers, and by ourselves in the future); we aim at facilitating collaboration with other researchers, and we desire to minimize the probability of committing errors.
To this end, we adopt the following rules.
We write code in a .r program file
We tend not to use the "command line", and we write commands in ".r" program files using a "text editor". Below I will return on this very important issue.
If you do not have it already, install a good text editor (even if we will use R Studio: a text editor is always useful).
Comment your code properly
Whenever coding (in R or in any other programming language), write comments so that other people understand what you have done, and you yourself understand what you did, when using code that you produced long ago.
File naming rules
All files should have names which remind what they do.
All files should terminate with the date in which they have been created or updated, using the standard yyyy_mm_dd.
When modifying a file, start by renaming and saving it, where:
Set the initials in the filename to yours, if needed (ex: _LP_)
Update the date, always using the standard: yyyy_mm_dd
Remember: NEVER leave "holes" when naming a file. Always use "snake case" (i.e., separate words with "_" (underscores) to name files. Say: "effects_corruption_LP_2018_08_17.odt" is a legitimate file name; "effects corruption.odt" is not.
Examples of legitimate file names:
R_project_Smith_2021_08_23.R (a "R" file)
Chapter_1_Thesis_Unibo_Smith_2021_09_18.docx (a "Word" file)
Why these rules? Because they work. More sophisticated solutions are available (for example, using Github. But in our case, if you all abide to these simple rules we shall be fine.
Coding in R
Indent your code (8 spaces, for example)
R programming files are called ".r files" (since they have an extension ".r").
All ".r files" should use the same template that we use in class (that is, it must have a header with information on the part of the project that a given .R file is meant to execute.
All ".r files" should have adequate comments.
Be "case sensitive": an "a" is different from an "A"
We want our programming file to be easily "portable". Portability means that you should be able to run your code in any computer without making significant changes to it. For this reason we use relative pathnames, and NOT absolute ones. In this way, just a change at the beginning of an .r file will make it work in any computer. You'll just have to:
Update the working directory to the one you are using, by commenting out (if needed) the one that does not work for you. As in:
# LinuxBox LP
setwd("/home/lucio/Dropbox/R/REcon_Book/")
(Note: in this way, you only need a change at the beginning of each .r file, since all other paths will be "relative")
Within pathnames, use "slashes" ("/"), and NOT "backslashes" ("\") - in some operating systems it's the same, but slashes always work, whereas backslashes don't.
Navigating the file system
If the note above on "relative pathnames" was obscure to you, please read the following.
The file system, that is, where your files are, is organized hierarchically, as a tree. It starts with an origin, or "root", which contains branches (or "directories"), each one of which might have more directories, of files.
A "/" (slash) indicates the root directory, and it is also the separator for different directories. For example, if we write:
/home/user_3/funny_files/hello.txt
we understand that there is a file called "hello.txt", probably textual, considering its "extension", contained in the directory "user_3", which is in its turn contained in the directory "home", which is a directory of "root" (that is, the are no further directory above it).
When we run a command, we are positioned in some which we call the "working directory". In Unix, we can find out which one it is with the command pwd (short for "print working directory").
Suppose that the current working directory is:
/home/user_3/
and that we want to type on the screen the content of the file "hello.txt" - in unix, we could use the command "cat". It is important to understand that we have two ways to identify that file:
Using an "absolute pathname", which in this case would be, as above:
cat /home/user_3/funny_files/hello.txt [enter]
Using a "relative pathname", which starts from the working directory. In this case, the command would be:
cat funny_files/hello.txt [enter]
Both alternatives are correct, but we prefer relative pathnames (i.e., alternative n. 2). Le me reiterate the reason. They help in making the code "portable" - that is, they make it easier to run it in different computers. Suppose that our friend is not "user_3", but "user_2".
Then, to run our code, she should change all the occurrences of absolute pathnames. On the other hand, if relative path names are used instead, she will only have to declare once, at the beginning of the code, the name of the "working directory". In Unix, she would simply write:
cd /home/user_2/
And in R, see the example above, she would use the "setwd" (Set working directory) command.
This is just one change instead of potentially many, so it might be very helpful. All we have to do is to agree on a "directory structure" where we keep all our files, which is the next topic to discuss.
Try it using an online Unix terminal, such as this one: Cocalk Linux Terminal
The directories for the project
A project folder must have the following sub-folders:
/r_files
/in_data (where you save the dataset(s) used as inputs for your project. They can be in formats such as Excel, CSV, and more).
/out_data (to save datasets, data frames, after processing, and for later use - if needed)
/temp (for temporary files, if needed)
/docs (where documents are stored)
Also we need: old_r (a subdirectory of /r - to put old .R files and keep the directory tidy), and old_docs (a subdirectory of /docs, for documents).
Please note: to guarantee portability, it is of key importance that the subdirectories are named exactly as indicated. Remember: a r is not an R, a "_" is not an empty space nor a "-", etc.
Modifying a .r file
If you are simply using a .r file, and the only change you make is to set the working directory, don't bother to rename it and save it
When you modify something inside a .r file, preceed the change with
[YourInitials, date, brief explanation].
If you add more than one line, end the new part with [/YourInitials]
Example:
[LP, 2020/08/17: the following lines introduce an alternative formulation of var xxx]
(line 1)
(line 2)
[/LP]
The /docs folder
We use the /docs folder to save the documents of the project - in particular, the text of your project.
In the Dropbox shared folder there is a sub-directory named "R_exam".
It contains the usual set of subdirectories (/docs, /in_data, /out_data, /r_files, and /temp).
Copy it locally (in your computer, that is).
The R_projects directory will contain the following files:
/docs/DA_Exam_Instructions_[...]..docx : it is the document with all the instructions, and it also serves as a template for the document that you have to produce.
In /in_data, the data that you need.
Save your files locally.
When you are done, DO NOT send me the files by email. Do as follows:
Call your R script: "DA_[yourlastname]_[date].r
Call your document: DA_[yourlastname]_[date].docx (or, .odt, in case you use Open Office).
create a subfolder of the shared folder /R_exam named "your last name" (all in small letters).
Copy there the whole subdirectory structure, so that I will find there both your .r script, and the .docx (or .odt, that is, open office) document with your results
Before you update your files in your dropbox folder, please read again all the instructions (how files should be named, etc.). Consider that following those guidelines could be a "litmus test" of sort. Of what, you decide.
send me an email with "R Project" as title, and in the text of the message, simply, let me know that you are done. Which means that I can take your folder offline.
The R_projects directory will contain the following files:
/docs/QM_Exam_Instructions_[...]..docx : it is the document with all the instructions, and it also serves as a template for the document that you have to produce.
In /in_data, the data that you need.
Call your document: QM_[yourlastname]_[date].docx (or, .odt, in case you use Open Office).
When you are done, do as you did for the R Data Analysis project: update your files in your dropbox folder, and send me an email with "R Project" as title, and in the text of the message, simply, let me know that you are done.