Provide an estimate of the population average additive treatment effect (ATE) of a binary treatment on a binary or continuous outcome, and a 95% confidence interval. There are 3200 low dimensional datasets, and 3200 high-dimensional datasets. Within each track 100 datasets have been drawn from 32 unique data generating processes (DPG). Participants will download these datasets, run analyses using their own computing resources, and upload results to the website for evaluation. Teams may choose to analyze only the low-dimensional datasets, only the high-dimensional datasets, or submit results for both tracks.
Covariates were drawn from publicly available datasets, or simulated. Identifiability of the parameter is guaranteed, however challenges to estimation have been built-in to the processes for generating the binary treatment assignment, and binary or continuous outcome. These include non-linearity of the response surface, treatment effect heterogeneity, varying proportion of true confounders among the observed covariates, and near violations of the positivity assumption. This year's challenge has two tracks:
Low dimensional datasets (varying size, e.g., 500 x 20)
High dimensional datasets (varying size, e.g. 1000 x 200, 2000 x 200)
The causal parameter of interest is the population average additive treatment effect (ATE), E[Y(1) - Y(0)]. Causal assumptions of consistency and strong ignorability are guaranteed by the contest organizers, therefore the target statistical estimand equals E[E(Y | A = 1, V) - E(Y | A = 0, V)], where Y is the outcome, A is a binary treatment indicator, and V is a pre-treatment covariate vector containing all true confounders, and possibly additional variables related to either the outcome, treatment assignment, or neither.
(afterwards, details about the DGPs and true population ATE will be provided on this website )
dataset_name, ATE, lb, ub
Entrants in the ACIC, 2019 Data Challenge retain ownership of all intellectual and industrial property rights (including moral rights) in and to submitted results and descriptive information (Submission). As a condition of submission, Entrant grants the Organizers a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive license to use, reproduce, adapt, modify, publish, distribute, publicly perform, create a derivative work from, and publicly display the Submission. Entrants will not be asked to submit code, but may be given the option to publish their code at a later date. Entrants may opt to remain publicly anonymous, but must provide a contact email that will be used by the Organizers only for communications regarding the Submission.