Data Challenge

R Code for Generating Challenge Datasets is available for download (Aug 20, 2019).

Results of the Data Challenge Now Available (May 24, 2019)

19 teams submitted results for 29 methods. Participants were from academia and industry, worldwide.
Covariates were simulated or drawn from 7 source datasets: healthcare, business, social. We gratefully acknowledge the funders and maintainers of data repositories at UC Irvine, Vanderbilt University, and Columbia University.
The true population ATE for each data file, slides presented at ACIC on May 24, 2019, and plots summarizing each method's performance are now available for download.
Updated high-dimensional results files include a revised PCATSv2 entry that fixed a bug in the original reported results. Available for download (Aug 21, 2019).
(Although the Challenge is over, the datasets remain available below.)

The Challenge

Provide an estimate of the population average additive treatment effect (ATE) of a binary treatment on a binary or continuous outcome, and a 95% confidence interval. There are 3200 low dimensional datasets, and 3200 high-dimensional datasets. Within each track 100 datasets have been drawn from 32 unique data generating processes (DPG). Participants will download these datasets, run analyses using their own computing resources, and upload results to the website for evaluation. Teams may choose to analyze only the low-dimensional datasets, only the high-dimensional datasets, or submit results for both tracks.

Data Description

Covariates were drawn from publicly available datasets, or simulated. Identifiability of the parameter is guaranteed, however challenges to estimation have been built-in to the processes for generating the binary treatment assignment, and binary or continuous outcome. These include non-linearity of the response surface, treatment effect heterogeneity, varying proportion of true confounders among the observed covariates, and near violations of the positivity assumption. This year's challenge has two tracks:

Low dimensional datasets (varying size, e.g., 500 x 20)

High dimensional datasets (varying size, e.g. 1000 x 200, 2000 x 200)

Scientific Background

The causal parameter of interest is the population average additive treatment effect (ATE), E[Y(1) - Y(0)]. Causal assumptions of consistency and strong ignorability are guaranteed by the contest organizers, therefore the target statistical estimand equals E[E(Y | A = 1, V) - E(Y | A = 0, V)], where Y is the outcome, A is a binary treatment indicator, and V is a pre-treatment covariate vector containing all true confounders, and possibly additional variables related to either the outcome, treatment assignment, or neither.

Challenge Timelines

The Challenge Is Now Closed! Results will be announced at the Conference

(afterwards, details about the DGPs and true population ATE will be provided on this website )

December 1, 2018: A few sample datasets available for constructing your entry
January 14, 2019: Challenge Datasets now available below!
April 15, 2019: Deadline for results files to be uploaded (this has been corrected - it used to say April 8th)
May 22-24, 2019: Preliminary results announced at ACIC, 2019, in Montreal, Canada
May 24, 2019: True population ATE for each data file and summary performance results available below

Test Datasets for Method Development

Files for the Data Challenge will be in the same form as the sample files available for download below that have filenames testdatasetX.csv. Data are in the form (Y, A, V1, ... ,Vp), where
For method development we have also provided additional files containing the true treatment effect and the counterfactual outcomes for each observation. These files have the name testdatasetX_cf.csv. Data are in the form (ATE, EY1_i, EY0_i), where
Notes:
1. The the observed Y_i in the test data file is not identical to either EY1_i or EY0_i because of random error.
2. The population ATE is not identical to the sample ATE. For the data challenge bias will be assessed by averaging the estimates over the 100 replicates drawn from the same underlying DGP.

Datasets (registration not required for download)

Download zip file containing (revised Dec 31, 2018) low-dimensional test datasets
Download zip file containing (revised Dec 31, 2018) high-dimensional test datasets
Download zip file containing Challenge low-dimensional datasets
Download zip file containing Challenge high-dimensional datasets

Submit Results

Teams will submit results for each track separately. Results should be submitted as a .csv file.
- Name your file(s) to include the track name, e.g., TeamName_low.csv or TeamName_high.csv
- There must be one line per dataset (3200 lines total)
- Each line should have four entries: the dataset name (e.g, low1), the estimated ATE, and lower and upper bounds on the 95% confidence interval

dataset_name, ATE, lb, ub

Don't worry about the order of the datasets within the file. We'll rely on the dataset name.

There won't be any Leader Board or individual team feedback (unless we encounter difficulties reading your file)
There are two options for submitting your file:
- Click here to fill out the submission form and upload file(s) (requires gmail address)
- Click here to fill out the submission form and email file(s) (no login required)
- Terms (adapted from https://opensource.google.com/docs/hackathons/#modelterms):

Entrants in the ACIC, 2019 Data Challenge retain ownership of all intellectual and industrial property rights (including moral rights) in and to submitted results and descriptive information (Submission). As a condition of submission, Entrant grants the Organizers a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive license to use, reproduce, adapt, modify, publish, distribute, publicly perform, create a derivative work from, and publicly display the Submission. Entrants will not be asked to submit code, but may be given the option to publish their code at a later date. Entrants may opt to remain publicly anonymous, but must provide a contact email that will be used by the Organizers only for communications regarding the Submission.

Google Sites

Report abuse