L05 EDA

Looking at Data: Exploratory Data Analysis (EDA)

The lecture shows that it is essential to look at data before doing regression analysis. Exactly the same regression output can come from different types of data sets and have entirely different meanings. In particular

FUNCTIONAL FORM (linear or not)

OUTLIERS

CLUSTERS OF DATA (combing leads to misleading results)

MISSING IMPORTANT OR RELEVANT VARIABLES

All of these problems can lead to HIGHLY MISLEADING regression results.

Lecture ends by examining structural change in the INDY 500 Winning Speed DATA. HOMEWORK is to look for linearity and structural change in GNP series. If OUTLIERS can be found, that is also something to check. In connection with structural change, the following two articles would be very useful to look at:

Tests for Structural Change, Aggregation, and Homogeneity

Changing Point and Parameter Instability with Heteroskedastic Models

Video Lecture (my website) - Lecture on need for graphical/visual understanding of data

L3 Necessity of EDA prior to Regression - Discusses thesis of Uzma Bashir, Determinants of Corporate Philanthropy. Shows how clusters mislead regressions