Stata Cheatsheet

Stata Graph Example
Bar graph with conficence interval
Twoway Bar
global macro example
- global SOURCE "/Users/haotongtong/Documents/CHARLS"
- global RAW2013 "${SOURCE}/CHARLS2013"
- global RAW2014 "${SOURCE}/CHARLS2014_Life_History_Data"
- global INPUT "${SOURCE}/SYP_CHARLS2014/build/output"
- global OUTPUT "${SOURCE}/SYP_CHARLS2014/analysis/output"
- global CODE "${SOURCE}/SYP_CHARLS2014/analysis/code"
- global TEMP "${SOURCE}/SYP_CHARLS2014/analysis/temp"
- use "${INPUT}/G1G2G3ready.dta", clear
Import from excel, specific range
rename variable
- rename oldname newname
Summarize by group
- tabulate cohort, summarize(edu)
- OR: tabstat birthYear, by(cohort) stat(mean sd min max)
append two numeric numbers:
- gen prefecture4digit=string(province,"%02.0f")+string(preferture,"%02.0f")
- destring prefecture4digit, replace
subtract string portion
- tostring county, generate (countyString)
- gen province=real(substr(countyString, beginningPosition, length))
seperate string by space/hyphen/comma
- split var1, p(" ")
Label a dataset and variables:
- label data "This file contains county crosswalks"
- label variable unit_code_common_prov "province name, adjusted by TT"
Label a variable "GFclass" which has three discontinuous values:
- label define GFclass1 1 "Poor" 2 "Middle" 3 "Rich"
- label values FFclass GFclass1
Set local to minimize space
- local varlist "ngdp_all_g ngdp_agri_g ngdp_nonagri_g rgdp_all_g rgdp_agri_g rgdp_nonagri_g capital_all_g capital_agri_g capital_nonagri_g emp_all_g emp_agri_g emp_nonagri_g"
- sum `varlist' if year<=1987
Creat Dummies
- tabulate SC, gen(SCdummy)
Create within group ID
- sort group
- by group: gen memberID = _n
collapse
- collapse (mean) eduYear edu, by(birthYear CITY)
- collapse (mean) pctile_complexind (count) OCI_size2012=birthYear [fweight=wt], by(occ1990dd)
reshape long vs wide
- To go from long to wide: reshape wide stub, i(i) j(j) ---> j existing variable
- To go from wide to long: reshape long stub, i(i) j(j) ---> j new variable
egen by group, with some calcualtion:
- bysort group: egen complexind_occ2digit=mean(complexind)
- egen nchild=total(age<=17), by(family) * This gives you a count
generate lag variable
- sort state year
- by state: gen lag1 = x[_n-1] if year==year[_n-1]+1
Format numbers to increase precision in outsheet/file, so that you can copy to excel.
- format emp %12.0f
- egen long emp=...
HP filter:
- tsfilter hp residual=varname, trend(trendofwarname)
Merge
- use "${RAW2014}/Demographic_Backgrounds.dta", clear
- merge 1:1 ID using "${RAW2014}/Sample_infor.dta"
- drop if _merge==2
- drop _merge
predict after regression:
- predict newvar, xb
- predict newvar, residual
Make up missing data point (cubic/linear estimation):
- ssc install cipolate
- sort year
- cipolate y x, gen(yprime)
- ipolate y x, gen(yprime)
inlist
- gen uptRail==.
- replace uptRail = upt if(inlist(Modes, "HR", "LR", "SR", "CR", "IP", "CC", "MG", "MO"))
Percentile
- pctile MincomePct=Mincome , nq(23)
- list MincomePct in 1/22
- xtile percentHat=MincomeHat , nq(22)
preserve current data
- preserve
- restore
Graph options: add grid in the graph background
- scatter Y X, ylabel(,grid)
Graph: Reshape a graph, write this after you displayed a graph:
- graph display, xsize(10)
- OR
- graph export "${OUTPUT}/name.png", width(800) height(600) replace
Graph: multiple lines with different shape
- twoway line coastalprimary coastalsecondary coastaltertiary inlandprimary inlandsecondary inlandtertiary year, legend(row(2)) lpattern(solid solid solid dash dash dash) lcolor(green red navy green red navy) title("labor share by sector and region, 1982-2012")
Graph: Combine two graphs:
- hist pctile_complexind, bin(20) xtitle("Occupation Complexity Percentile Index") saving(20bin)
- hist pctile_complexind, bin(100) xtitle("Occupation Complexity Percentile Index") saving(100bin)
- gr combine "20bin" "100bin", col(1) iscale(1) title("Distribution of Occupation Complexity, Census2000")
Bar graph with conficence interval
- generate `x'_hi = `x' + invttail(pop-1,0.025)*(`x'_sd / sqrt(pop))
- generate `x'_lo = `x' - invttail(pop-1,0.025)*(`x'_sd / sqrt(pop))
- graph twoway (bar `x' worker_type, barw(.8)) (rcap `x'_hi `x'_lo worker_type), legend(order(1 "`x'" 2 "95% CI")) ytitle("`x'") xtitle("") title("average `x'") xlabel(1 "local farmer" 2 "local non-farmer" 3 "migrant" 4 "local urban")
Bar graph with number on each bar:
- graph bar (mean) edu_common, over(moverType) over(hukou) over(age_group) asyvars xsize(10) ytitle("education level (1-4)") blabel(bar, position(outside) format(%4.2f)) legend(rows(1)) title("education")
Bar graph: order bars, assign color, label/name bar
- index name migrant
- 1 Coast to Coast 15.15
- 2 Inland to Inland 33.00
- 3 Coast to Inland 2.75
- 4 Inland to Coast 4.95
- separate migrant, by(name) veryshortlabel
- graph bar migrant? , over(name, sort(index)) nofill legend(order(1 "Home Coastal" 3 "Home Inland")) bar(2, color(dkgreen)) bar(3, color(orange_red)) bar(4,color(orange_red)) title("2012 Migration pattern for NonAgriHukou (millions)") yscale(r(0 33)) blabel(bar, position(outside) format(%4.2f))
Smooth line/connected plot
- preserve
- collapse (mean) eduYear, by(birthYear)
- tsset birthYear, yearly
- tssmooth ma avgeduYear = eduYear, window(1 1 1)
- twoway line avgeduYear birthYear, legend(pos(5) ring(0) col(1) size(medsmall) order( 1 "fathers")) title("Average Father Years of Education by birth year") ytitle("Average Years of Education") xtitle("Birth Year")
- graph export "AvgEdu_father.png",replace
- restore
Plot by group
- twoway hist age,w(1) by(moverType)
Plot shaded period
- sum relativeA_nonagri1
- local max=`r(max)'
- gen period1=`max' if year>=1985& year<=1990
- gen period2=`max' if year>=1995& year<=2005
- gen period3=`max' if year>=2005& year<=2010
- gen one=1
- twoway (area period1 year, bcolor(gs15)) (area period2 year, bcolor(gs14)) ///
- (area period3 year, bcolor(gs13)) (line one year if year<=2010, lcolor(red) ) ///
- (line relativeA_agri1 relativeA_nonagri1 relativeA_agri2 relativeA_nonagri2 year if year<=2010, ///
- legend(row(2) order(5 "Coastal Agri" 6 "Coastal NonAgri" 7 "Inland Agri" 8 "Inland NonAgri")) ///
- color(green navy green navy) lpattern( solid solid dash dash) title("Simple Relative TFP"))

Generage CDF graphs of "pctile_complexind"
- collapse (mean) pctile_complexind (count) OCI_size1990=birthYear, by(occ1990dd)
- cumul pctile_complexind [weight=OCI_size1990], gen(OCIcumul1990)
- sort pctile_complexind
- tw (line OCIcumul1990 pctile_complexind), xtitle("Mean occupation complexity") ytitle("CDF of Occupation Complexity")
Loop, save matrix
- quietly tabstat pctile_complexind, stat(mean sd min p25 p50 p75 n ) save
- return list
- matrix list r(StatTotal)
- matrix OCI`z'=r(StatTotal)
- local i=1
- local OCI`z'Title "`z': "
- foreach x of local SumStatList {
- local OCI`z'`x'=round(OCI`z'[`i',1], 0.001)
- local i=`i'+1
- local OCI`z'Title "`OCI`z'Title' `x'=`OCI`z'`x'';"
- }
Loop over number/item
- foreach x in a b mpg 2 3 2.2 {
- . . . ‘x’ . . .
- }
Loop over a defined macro/local
- foreach x of local varlist {
- . . . ‘x’ . . .
- }
Loop example with the use of in/of
- #delimit ;
- foreach y in all agri nonagri{;
- foreach x of numlist 1990 1995 2000 2005 2010 { ;
- spmap avg_g_emp_`y'`x' using "${SOURCE}/china_map.dta", id(id) clnumber(9) clmethod(quantile)
- fcolor(BuRd) ndfcolor(gray) label(label(province) xcoord(x_coord) ycoord(y_coord) size(*.4))
- title(`x' `y' Employment Growth in past 5 years );
- graph export "${SOURCE}/MAPOUTPUT/map_g_emp_`y'`x'.png",replace;
- };
- };
- #delimit cr
Gini coefficient, inequality index
- ssc install ineqdeco
- quietly ineqdeco FamilyIncome if CR==1
- return list
Add an extra observation and give it value
- local nn=_N+1
- set obs `nn'
- replace occ2005china=6 in `nn'
- replace pctile_complexind=.1416273 in `nn'
spmap for drawing/plotting map
- You need a map base merged to current data, then plot on concordance
- add numbers to map
spmap with pie chart on each region/province：
- spmap migShare_5years using "${DATAINPUT}/china_map.dta", id(id) fcolor(Reds2) ///
- ndfcolor(gray) title("2000 Migration Share ", size(*0.8)) legtitle("migration share") ///
- subtitle("(labor moved in past 5 years)", size(*0.8)) ///
- diagram(variable(migWtinProvShare_5years migBTProvShare_5years ) ///
- xcoord(x_coord) ycoord(y_coord) fcolor(OrRd) legenda(on))
Save regression using Outreg2
- quietly reg Cedu female Cage Cage2 Fedu Medu, cluster(PCODE)
- outreg2 using "${TEMP}/primary.xls", ctitle(Father) addtext(County FE, N, Year FE, N) keep(C* female* F* M*) replace
Save regression using Outreg2, alternative
- quietly reg Cedu female Cage Cage2 Fedu Medu, cluster(PCODE)
- est store G2G3
- outreg2 [G2G3*] using "${TEMP}/T1G2G3.xls", keep(C* female* F* M*) replace
Putexcel:
- putexcel A1=("Car type") B1=("Freq.") C1=("Percent") using results, replace
- putexcel A2=matrix(names) B2=matrix(freq) C2=matrix(freq/r(N)) using results,
- modify
Survey tabulate to excel:
* Example 1 : one-way tabulate
- webuse nhanes2b, clear
- svyset psuid [pweight=finalwgt], strata(stratid)
- putexcel set "${OUTPUT}/test.xlsx", sheet("test") replace
- svy: tabulate race , format(%11.3g) percent missing
- mat pct = e(b)'
- svy: tabulate race , format(%11.3g) count se missing
- mat cnt = e(b)'
- mat se = vecdiag(e(V))'
- loc n = rowsof(se)
- forv i=1/`n' {
- mat se[`i',1] = sqrt(el(se,`i',1))
- }
- putexcel A1 = matrix(pcb)
- putexcel B1 = matrix(cnt)
- putexcel C1 = matrix(se)

* Example 2 : two-way tabulate
- webuse nhanes2b, clear
- svyset psuid [pweight=finalwgt], strata(stratid)
- * svy: tabulate race agegrp
- putexcel set "${OUTPUT}/test.xlsx", sheet("test") replace
- levelsof agegrp, local(age)
- foreach lev in `age'{
- local col : word `lev' of A B C D E F
- svy: tabulate race if agegrp==`lev' , format(%11.3g)
- mat pcb = e(b)'
- putexcel `col'2 = matrix(pcb)
- }
Export to Excel:
- export excel using "${SOURCE}/results.xlsx", sheet("Data", modify) firstrow(variables) nolabel
Unicode for Chinese Characteristics
- unicode analyze surname_test.dta
- unicode encoding set gb18030
- unicode translate surname_test.dta
Check for Multicollinearity
- "vif" after a regression. vif>10 means there is collinearity concern.
House Cleaning:
- erase mydata.dta
Survey data set up
- gen wt2005=nbsfraction/svyfraction/0.001884215
svyset [pweight=wpp], strata (strata ) psu (ncode)
- svy: tab region
- svy, subpop(nonmigrant_agri): mean pctile_complexind
- svy: mean pctile_complexind, over (hukou nonLocalHukou)
- estat sd /// gives the sd of above mean
- svy: tabulate race, format(%11.3g) count ci deff deft /// svy count gives you the estimated count based on the weight.
Save an older .dta version
- saveold file_name.dta, version(13)
Making Stata stop what it is doing
- Stata for Windows: press Ctrl+Pause/Break
- Stata for Mac: press Command+. (period)
STATA14中文乱码
Installing User Written Programs http://www.borketunali.com/?p=58
- "After finding the program files and installing them to your computer, you will see helm.ado, helm.hlp, pvar.ado, pvar.hlp and sgmm.ado files in the zip file. The only thing that you should do is to copy every item and then paste them into the Stata’s ado files. Some researchers may have difficulties to find the directory of Stata’s ado files. Generally, these files are in your computer’s program files. So, go to computer-hard disk c or d-program files and find Stata. Open this file and you will see the ado file. Open ado file and paste the program into the relevant file. In the ado file you will see files which are called by letters such as a, b and c. So, for example if you copy pvar.ado, you should paste this file into the file called as p. After pasting all of the programs into the relevant files you will be able to estimate your model by using Stata."
Stata Source - Princeton http://www.princeton.edu/~otorres/Stata/statnotes

----------------------------------------------------------------------------------------

Latex code:

----------------------------------------------------------------------------------------

Float note under graph or table
- \usepackage[capposition=top]{floatrow}
- \floatfoot{*Note the y scales are different.}

----------------------------------------------------------------------------------------

SAS convert to STATA with R:

----------------------------------------------------------------------------------------

install.packages("sas7bdat")

library("sas7bdat")

data = read.sas7bdat("file_name.sas7bdat")

library(foreign)

write.dta(data, "stata_file_name.dta")

Google Sites

Report abuse