The purpose of this course is to learn some the foundations of statistics and, while doing that, to familiarize oneself with the R programming language. Regarding R, please see the project page.
Agresti, Alan; Christine Franklin and Bernhard Klingenberg. "Statistics. The Art and Science of Learning from Data". Fourth Edition. Pearson, 2017. (hence: AFK)
For the R Lab we will moreover use the following texts, that can be freely downloaded from the Internet:
Roger D. Peng. R Programming for Data Science. It can be obtained here. Please also check Peng's course on "Coursera".
And also:
Hanck, Christoph; Arnold, Martin; Gerber Alexander, and Martin Schmelzer Introduction to Econometrics with R. Online: https://www.econometrics-with-r.org/
Sundstrom, Williams. Guide to R for SCU Economics Students (contains replications of some of Stock-Watson problems) http://rpubs.com/wsundstrom/home
We will use several "Shiny apps" - neat interactive tools to learn important concepts. They are linked in the following "course contents" section. Several of them are from the textbook (Agresti et al., 2017). Others are from: Metzger, Shawna K. (2020), Using Shiny to Teach Econometric Models.
Introduction: Using data to Answer Statistical Questions
AFK: 1.1, 1.2
Exploring data and descriptive statistics
Types of data, and graphical summaries
AFK: 2.1, 2.2
The central tendency of the data
AFK: 2.3
The variability of the data
AFK: 2.4, 2.5
Bivariate analysis: contingency and correlation
AFK: 3.1, 3.2, 3.4
Association between two categorical variables
Relation between two quantitative variables
Probability
Definitions; marginal and conditional probabilities
AFK: 5.1, 5.2, 5.3
Probability rules and Bayes' theorem
AFK: 5.4
Probability distributions
Binomial, Normal, student-t, chi-squared, and F-distribution
AFK: 6.1, 6.2.
Binomial distribution (see also: here ); for approximation to the normal, here)
Compute probabilities from different distributions
Sampling distribution
AFK: 7.1, 7.2
Sampling distribution of the sample proportion
Sampling distribution of the sample mean (continuous population)
Sampling distribution of the sample mean (discrete population)
The distribution of the sample mean
Statistical inference: Estimation
AFK: 8.1, 8.2, 8.3, 8.4
Inference for population proportion
Sampling distribution of the sample mean
Interval estimation & test of hypothesis
Statistical inference: Test of hypothesis
(on the mean, the relative frequency, and on statistical independence -Chi- square test)
AFK: 9.1, 9.2, 9.3, 9.4, 9.5, 9.6.
Errors and power in significance testing
Parallel to the regular classes, we will learn how to use the software R. We will use scripts (programs) that will be made available in a Dropbox shared folder.
R Lab 1: Introduction to R.
Variety of programs available.
How to do a project
The "file system"
Basic commands, data management, use of ".r files".
R Lab 2: Introduction to R.
Inputting data, data transformation, first elements of data analysis.
file: /R/lectures/r_files/REcon_Ch2_[date].r
(From: Sunstrom & Kevane, "Guide to R - Data analysis for Economics" )
R Lab 3: Exploratory data analysis.
Descriptive statistics. Data visualization.
file: /R/lectures/r_files/REcon_Ch3-4_[date].r
R Lab 4: Exploratory data analysis.
Descriptive statistics. Data visualization.
file: /R/IMF-WEO/r_files/imf_weo_[...].r
file: /R/graphs/r_files/world_map_[...].r
R Lab 5: Exploratory data analysis.
Descriptive statistics. Data visualization.
file: /R/graphs/r_files/chord_diagram[...].R
file: /R/QM/r_files/exploratory_analysis[..].R
R Lab 6: inferential statistics.
Central limit theorem.
file: /R/QM/r_files/CLT_LP_[date].r
Exam
Students will have to complete a project using R, that will count as a midterm exam, and will have their first opportunity to take the final exam on Wednesday December 1 at 11 am. A "pass" grade can be refused, and the final exam retaken, ONLY ONCE. The grade on the R project can't be refused.
The final grade will be a weighted average of the result of the midterm exam (R Project) and of the final exam, with respective weights 0.4 and 0.6.
The deadline for the R Project (Data Analysis) is Tuesday 16 November 2021, at 11:59 pm.
Lucio Picci
18 September 2021