combinedata-code

Data and computer programs for “Optimally combining censored and uncensored datasets”

The Stata dataset is called “rev_data.dta” (download zipped dataset). It contains the following variables:

1. birthpl = State of Birth

2. yearat14 = Year in which the person turned 14

3. req_sch = Minimum Schooling Required Before Dropping out

4. ca = Required Schooling (accounting for age requirements)

5. enrolage = Age by which child must start school

6. drop_age = Earliest age at which can drop out of school

7. cl = Minimum Schooling required for a Work Permit

8. year = Census Year

9. perwt = Person Weight

10. age = Age at Census (in years)

11. agemarr = Age at first Marriage (in years)

12. educh = Years of Education

13. censored = Indicator variable = 1 if age at first marriage censored

14. female = Indicator variable = 1 if female

15. white = Indicator variable = 1 if white

16. r_age = Age at Census (in quarters)

17. r_agemar = Age at First Marriage (in quarters)

See the data-appendix in the paper for details about how “r_age” and “r_agemar” were created.

Stat Transfer was employed to convert “rev_data.dta” to a Gauss dataset named “rev_data.dat”. The subsamples for the application in the paper were created from “rev_data.dat” by the following programs (download zipped programs).

Dataloop (Gauss) code to create subsamples

Female subsample

r-women-60-80.prg (enriched dataset where the master sample is from 1960 and the refreshment sample is from 1980)

r-women-70-80.prg (enriched dataset where the master sample is from 1970 and the refreshment sample is from 1980)

r-women-80-only.prg (refreshment sample)

r-women-60-80-s55.prg (enriched dataset for robustness check where age at first marriage for unmarried individuals in the refreshment sample is imputed to be 55)

r-women-60-80-s65.prg (enriched dataset for robustness check where age at first marriage for unmarried individuals in the refreshment sample is imputed to be 65)

r-women-70-80-s55.prg

r-women-70-80-s65.prg

Male subsample

r-men-60-80.prg

r-men-70-80.prg

r-men-80-only.prg

r-men-60-80-s55.prg

r-men-60-80-s65.prg

r-men-70-80-s55.prg

r-men-70-80-s65.prg

Gauss code for the GMM60 and OLS80 estimators

GMM70 is implemented exactly like GMM60 except that it uses “r-men-70-80.dat” and “r-women-70-80.dat” for the male and female subsamples, respectively. Same applies for the robustness checks.

Female subsample

r-women-log-cadum-hausman.prg (GMM60)

r-ols-women-80-only.prg (OLS80)

r-women-log-cadum-s55-hausman.prg (robustness check)

r-women-log-cadum-s65-hausman.prg

Male subsample

r-men-log-cadum-hausman.prg

r-ols-men-80-only.prg

r-men-log-cadum-s55-hausman.prg

r-men-log-cadum-s65-hausman.prg

Stata code for Tobit

tobit.prg

Matlab code for simulations (download zipped programs)

c fixed (needs master_neg_tobit_loglik.m)

cfixed6_20p.m (n_R / n_M = 20%)

cfixed6_80p.m (n_R / n_M = 80%)

c random (needs master_neg_tobit_loglik_crandom.m)

crandom4_20p.m

crandom4_80p.m