In this module, students will learn about the different types of data structure: a cross-sectional data, time-series data, and a cross-sectional time-series data. For each type, students will actually download a particular data set and explore it through STATA. From this practice, students will also learn about how to put an excel file into the STATA.
1. A Cross-Sectional Data: when the unit of data is observed at the same point of time, you have a cross-sectional data. An opinion survey conducted in a certain year is a good example of cross-sectional data. In this case, the observation is individual people. Beyond the survey data, the observational unit could be a community, state, or country. Here, we will explore a survey data as an example.
Survey Data
Let's keep working on the data we had for the assignment #1. Please visit the website (http://gss.norc.org) and download the 1996 General Social Survey. Please download the STATA file.
Please open up a dofile!
use 1996GSS.dta, clear
Let's browse the data and see how the data looks like. Type the following command and execute it.
browse
The browse command shows the data in a spread sheet. In a survey data, each row represents an individual and each column a survey item.
describe
The describe command displays information about the data in memory. It shows you the sample size, the number of variables, the size of the file, the list of variables, variable types, and even variable labels.
help describe
FYI. you can always ask STATA about the command. Just type help in front of the command you are using. It will bring up the ado file of the command. It explains the function of the command, options, and syntax.
Abbreviation: the help document underlines a part of the command for abbreviated use of the command. Try this command.
2. Time Series Data: when the unit of data was observed over a period of time, we have a time series data or a longitudinal data. Most economic data has this longitudinal structure. Please download the "WDI US GDP per capita.xlsx" from the class Blackboard. Data was extracted from the World Bank Database.
Time-series data
import excel using "WDI US GDP per capita.xlsx", firstrow clear
This syntax allows users to bring an excel data into the STATA. The firstrow option is to use the first row as a variable label.
rename GDPpercapitaconstant2010US GDPpc
The variable name is too long. With the rename command, we can changed into something shorter.
tsset Time
For a time-series data, you can declare that the data is in a time-series format. Once declared, users can use the time-series specific syntax.
tsline GDPpc
Now, we can plot the time-series data.
2. Cross-sectional Time-Series Data: If the unit of data is observed cross-sectionally and longitudinally, then we have a cross-sectional time-series data (CSTS). However, when the number of time unit is smaller than the number of observations in each cross-section (N > T), we have a panel data.
Panel Data (N > T): Go to the CCES website, please download the CCES 2010-2014 Panel Study (https://cces.gov.harvard.edu/)
use "CCES_Panel_Full3waves_VV_V4.dta", clear
please open the dta file.
9,500 individuals have been surveyed three times. (N = 9,500 > T = 3 waves)
lookfor Abortion
With this command, we can search variables that have "Abortion" in their labels.
tabulate CC10_324
We can make a frequency table with this command.
From here, we can see that a higher number is for the pro-choice attitude.
tabulate CC10_324 if CC10_324==1 & CC12_324==4
Just curious about how many people changed their mind between 2010 to 2012 from the pro-life attitude to the pro-choice. With this syntax, we can tell how many individuals had made this dramatic change of an attitude.
CSTS Data (N < T): please download the file, "OECD STAT POP.csv" from the class Blackbaord.
insheet using "OECD STAT POP.csv", clear
This syntax allows you to upload the comma delimited text file. This is one of the most common types of data!
Let's check how many countries and years you have in this data.
sum time
The OECD data contains information about the population of each country from 1950 to 2060 (projections)
display 2060 - 1950 + 1
We have 111 time points in our data.
egen id = group(country)
sum id
This syntax creates a serial number for each country.
The summary command shows that we have 36 countries in this data.
As N (36) < T (111), we have CSTS Data.
xtset id time
We need declare the CSTS data structure.
But the error code returned. "repeated time values within panel"
Assignment 2: Find a solution to the error message! Please make the xtset command syntax work!