global macro example
global SOURCE "/Users/haotongtong/Documents/CHARLS"
global RAW2013 "${SOURCE}/CHARLS2013"
global RAW2014 "${SOURCE}/CHARLS2014_Life_History_Data"
global INPUT "${SOURCE}/SYP_CHARLS2014/build/output"
global OUTPUT "${SOURCE}/SYP_CHARLS2014/analysis/output"
global CODE "${SOURCE}/SYP_CHARLS2014/analysis/code"
global TEMP "${SOURCE}/SYP_CHARLS2014/analysis/temp"
use "${INPUT}/G1G2G3ready.dta", clear
rename variable
rename oldname newname
Summarize by group
tabulate cohort, summarize(edu)
OR: tabstat birthYear, by(cohort) stat(mean sd min max)
append two numeric numbers:
gen prefecture4digit=string(province,"%02.0f")+string(preferture,"%02.0f")
destring prefecture4digit, replace
subtract string portion
tostring county, generate (countyString)
gen province=real(substr(countyString, beginningPosition, length))
seperate string by space/hyphen/comma
split var1, p(" ")
Label a dataset and variables:
label data "This file contains county crosswalks"
label variable unit_code_common_prov "province name, adjusted by TT"
Label a variable "GFclass" which has three discontinuous values:
label define GFclass1 1 "Poor" 2 "Middle" 3 "Rich"
label values FFclass GFclass1
Set local to minimize space
local varlist "ngdp_all_g ngdp_agri_g ngdp_nonagri_g rgdp_all_g rgdp_agri_g rgdp_nonagri_g capital_all_g capital_agri_g capital_nonagri_g emp_all_g emp_agri_g emp_nonagri_g"
sum `varlist' if year<=1987
Creat Dummies
tabulate SC, gen(SCdummy)
sort group
by group: gen memberID = _n
collapse
collapse (mean) eduYear edu, by(birthYear CITY)
collapse (mean) pctile_complexind (count) OCI_size2012=birthYear [fweight=wt], by(occ1990dd)
To go from long to wide: reshape wide stub, i(i) j(j) ---> j existing variable
To go from wide to long: reshape long stub, i(i) j(j) ---> j new variable
egen by group, with some calcualtion:
bysort group: egen complexind_occ2digit=mean(complexind)
egen nchild=total(age<=17), by(family) * This gives you a count
generate lag variable
sort state year
by state: gen lag1 = x[_n-1] if year==year[_n-1]+1
Format numbers to increase precision in outsheet/file, so that you can copy to excel.
format emp %12.0f
egen long emp=...
HP filter:
tsfilter hp residual=varname, trend(trendofwarname)
Merge
use "${RAW2014}/Demographic_Backgrounds.dta", clear
merge 1:1 ID using "${RAW2014}/Sample_infor.dta"
drop if _merge==2
drop _merge
predict after regression:
predict newvar, xb
predict newvar, residual
Make up missing data point (cubic/linear estimation):
ssc install cipolate
sort year
cipolate y x, gen(yprime)
ipolate y x, gen(yprime)
inlist
gen uptRail==.
replace uptRail = upt if(inlist(Modes, "HR", "LR", "SR", "CR", "IP", "CC", "MG", "MO"))
Percentile
pctile MincomePct=Mincome , nq(23)
list MincomePct in 1/22
xtile percentHat=MincomeHat , nq(22)
preserve current data
preserve
restore
Graph options: add grid in the graph background
scatter Y X, ylabel(,grid)
Graph: Reshape a graph, write this after you displayed a graph:
graph display, xsize(10)
OR
graph export "${OUTPUT}/name.png", width(800) height(600) replace
Graph: multiple lines with different shape
twoway line coastalprimary coastalsecondary coastaltertiary inlandprimary inlandsecondary inlandtertiary year, legend(row(2)) lpattern(solid solid solid dash dash dash) lcolor(green red navy green red navy) title("labor share by sector and region, 1982-2012")
Graph: Combine two graphs:
hist pctile_complexind, bin(20) xtitle("Occupation Complexity Percentile Index") saving(20bin)
hist pctile_complexind, bin(100) xtitle("Occupation Complexity Percentile Index") saving(100bin)
gr combine "20bin" "100bin", col(1) iscale(1) title("Distribution of Occupation Complexity, Census2000")
Bar graph with conficence interval
generate `x'_hi = `x' + invttail(pop-1,0.025)*(`x'_sd / sqrt(pop))
generate `x'_lo = `x' - invttail(pop-1,0.025)*(`x'_sd / sqrt(pop))
graph twoway (bar `x' worker_type, barw(.8)) (rcap `x'_hi `x'_lo worker_type), legend(order(1 "`x'" 2 "95% CI")) ytitle("`x'") xtitle("") title("average `x'") xlabel(1 "local farmer" 2 "local non-farmer" 3 "migrant" 4 "local urban")
Bar graph with number on each bar:
graph bar (mean) edu_common, over(moverType) over(hukou) over(age_group) asyvars xsize(10) ytitle("education level (1-4)") blabel(bar, position(outside) format(%4.2f)) legend(rows(1)) title("education")
Bar graph: order bars, assign color, label/name bar
index name migrant
1 Coast to Coast 15.15
2 Inland to Inland 33.00
3 Coast to Inland 2.75
4 Inland to Coast 4.95
separate migrant, by(name) veryshortlabel
graph bar migrant? , over(name, sort(index)) nofill legend(order(1 "Home Coastal" 3 "Home Inland")) bar(2, color(dkgreen)) bar(3, color(orange_red)) bar(4,color(orange_red)) title("2012 Migration pattern for NonAgriHukou (millions)") yscale(r(0 33)) blabel(bar, position(outside) format(%4.2f))
Smooth line/connected plot
preserve
collapse (mean) eduYear, by(birthYear)
tsset birthYear, yearly
tssmooth ma avgeduYear = eduYear, window(1 1 1)
twoway line avgeduYear birthYear, legend(pos(5) ring(0) col(1) size(medsmall) order( 1 "fathers")) title("Average Father Years of Education by birth year") ytitle("Average Years of Education") xtitle("Birth Year")
graph export "AvgEdu_father.png",replace
restore
Plot by group
twoway hist age,w(1) by(moverType)
Plot shaded period
sum relativeA_nonagri1
local max=`r(max)'
gen period1=`max' if year>=1985& year<=1990
gen period2=`max' if year>=1995& year<=2005
gen period3=`max' if year>=2005& year<=2010
gen one=1
twoway (area period1 year, bcolor(gs15)) (area period2 year, bcolor(gs14)) ///
(area period3 year, bcolor(gs13)) (line one year if year<=2010, lcolor(red) ) ///
(line relativeA_agri1 relativeA_nonagri1 relativeA_agri2 relativeA_nonagri2 year if year<=2010, ///
legend(row(2) order(5 "Coastal Agri" 6 "Coastal NonAgri" 7 "Inland Agri" 8 "Inland NonAgri")) ///
color(green navy green navy) lpattern( solid solid dash dash) title("Simple Relative TFP"))
Generage CDF graphs of "pctile_complexind"
collapse (mean) pctile_complexind (count) OCI_size1990=birthYear, by(occ1990dd)
cumul pctile_complexind [weight=OCI_size1990], gen(OCIcumul1990)
sort pctile_complexind
tw (line OCIcumul1990 pctile_complexind), xtitle("Mean occupation complexity") ytitle("CDF of Occupation Complexity")
Loop, save matrix
quietly tabstat pctile_complexind, stat(mean sd min p25 p50 p75 n ) save
return list
matrix list r(StatTotal)
matrix OCI`z'=r(StatTotal)
local i=1
local OCI`z'Title "`z': "
foreach x of local SumStatList {
local OCI`z'`x'=round(OCI`z'[`i',1], 0.001)
local i=`i'+1
local OCI`z'Title "`OCI`z'Title' `x'=`OCI`z'`x'';"
}
Loop over number/item
foreach x in a b mpg 2 3 2.2 {
. . . ‘x’ . . .
}
Loop over a defined macro/local
foreach x of local varlist {
. . . ‘x’ . . .
}
Loop example with the use of in/of
#delimit ;
foreach y in all agri nonagri{;
foreach x of numlist 1990 1995 2000 2005 2010 { ;
spmap avg_g_emp_`y'`x' using "${SOURCE}/china_map.dta", id(id) clnumber(9) clmethod(quantile)
fcolor(BuRd) ndfcolor(gray) label(label(province) xcoord(x_coord) ycoord(y_coord) size(*.4))
title(`x' `y' Employment Growth in past 5 years );
graph export "${SOURCE}/MAPOUTPUT/map_g_emp_`y'`x'.png",replace;
};
};
#delimit cr
Gini coefficient, inequality index
ssc install ineqdeco
quietly ineqdeco FamilyIncome if CR==1
return list
Add an extra observation and give it value
local nn=_N+1
set obs `nn'
replace occ2005china=6 in `nn'
replace pctile_complexind=.1416273 in `nn'
spmap for drawing/plotting map
You need a map base merged to current data, then plot on concordance
spmap with pie chart on each region/province:
spmap migShare_5years using "${DATAINPUT}/china_map.dta", id(id) fcolor(Reds2) ///
ndfcolor(gray) title("2000 Migration Share ", size(*0.8)) legtitle("migration share") ///
subtitle("(labor moved in past 5 years)", size(*0.8)) ///
diagram(variable(migWtinProvShare_5years migBTProvShare_5years ) ///
xcoord(x_coord) ycoord(y_coord) fcolor(OrRd) legenda(on))
Save regression using Outreg2
quietly reg Cedu female Cage Cage2 Fedu Medu, cluster(PCODE)
outreg2 using "${TEMP}/primary.xls", ctitle(Father) addtext(County FE, N, Year FE, N) keep(C* female* F* M*) replace
Save regression using Outreg2, alternative
quietly reg Cedu female Cage Cage2 Fedu Medu, cluster(PCODE)
est store G2G3
outreg2 [G2G3*] using "${TEMP}/T1G2G3.xls", keep(C* female* F* M*) replace
putexcel A1=("Car type") B1=("Freq.") C1=("Percent") using results, replace
putexcel A2=matrix(names) B2=matrix(freq) C2=matrix(freq/r(N)) using results,
modify
Survey tabulate to excel:
* Example 1 : one-way tabulate
webuse nhanes2b, clear
svyset psuid [pweight=finalwgt], strata(stratid)
putexcel set "${OUTPUT}/test.xlsx", sheet("test") replace
svy: tabulate race , format(%11.3g) percent missing
mat pct = e(b)'
svy: tabulate race , format(%11.3g) count se missing
mat cnt = e(b)'
mat se = vecdiag(e(V))'
loc n = rowsof(se)
forv i=1/`n' {
mat se[`i',1] = sqrt(el(se,`i',1))
}
putexcel A1 = matrix(pcb)
putexcel B1 = matrix(cnt)
putexcel C1 = matrix(se)
* Example 2 : two-way tabulate
webuse nhanes2b, clear
svyset psuid [pweight=finalwgt], strata(stratid)
* svy: tabulate race agegrp
putexcel set "${OUTPUT}/test.xlsx", sheet("test") replace
levelsof agegrp, local(age)
foreach lev in `age'{
local col : word `lev' of A B C D E F
svy: tabulate race if agegrp==`lev' , format(%11.3g)
mat pcb = e(b)'
putexcel `col'2 = matrix(pcb)
}
Export to Excel:
export excel using "${SOURCE}/results.xlsx", sheet("Data", modify) firstrow(variables) nolabel
Unicode for Chinese Characteristics
unicode analyze surname_test.dta
unicode encoding set gb18030
unicode translate surname_test.dta
Check for Multicollinearity
"vif" after a regression. vif>10 means there is collinearity concern.
House Cleaning:
erase mydata.dta
Survey data set up
gen wt2005=nbsfraction/svyfraction/0.001884215
svyset [pweight=wpp], strata (strata ) psu (ncode)
svy: tab region
svy, subpop(nonmigrant_agri): mean pctile_complexind
svy: mean pctile_complexind, over (hukou nonLocalHukou)
estat sd /// gives the sd of above mean
svy: tabulate race, format(%11.3g) count ci deff deft /// svy count gives you the estimated count based on the weight.
Save an older .dta version
saveold file_name.dta, version(13)
Making Stata stop what it is doing
Stata for Windows: press Ctrl+Pause/Break
Stata for Mac: press Command+. (period)
Installing User Written Programs http://www.borketunali.com/?p=58
"After finding the program files and installing them to your computer, you will see helm.ado, helm.hlp, pvar.ado, pvar.hlp and sgmm.ado files in the zip file. The only thing that you should do is to copy every item and then paste them into the Stata’s ado files. Some researchers may have difficulties to find the directory of Stata’s ado files. Generally, these files are in your computer’s program files. So, go to computer-hard disk c or d-program files and find Stata. Open this file and you will see the ado file. Open ado file and paste the program into the relevant file. In the ado file you will see files which are called by letters such as a, b and c. So, for example if you copy pvar.ado, you should paste this file into the file called as p. After pasting all of the programs into the relevant files you will be able to estimate your model by using Stata."
Stata Source - Princeton http://www.princeton.edu/~otorres/Stata/statnotes
----------------------------------------------------------------------------------------
Latex code:
----------------------------------------------------------------------------------------
Float note under graph or table
\usepackage[capposition=top]{floatrow}
\floatfoot{*Note the y scales are different.}
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
install.packages("sas7bdat")
library("sas7bdat")
data = read.sas7bdat("file_name.sas7bdat")
library(foreign)
write.dta(data, "stata_file_name.dta")