
This paper identifies the probability of causation when there is sample selection. We show that the probability of causation is partially identified for individuals who are always observed regardless of treatment status and derive sharp bounds under three increasingly restrictive sets of assumptions. The first set imposes an exogenous treatment and a monotone sample selection mechanism. To tighten these bounds, the second set also imposes the monotone treatment response assumption, while the third set additionally imposes a stochastic dominance assumption. Finally, we use experimental data from the Colombian job training program Jóvenes en Acción to empirically illustrate our approach's usefulness. We find that, among always-employed women, at least 10.2% and at most 13.4% transitioned to the formal labor market because of the program. However, our 90%-confidence region does not reject the null hypothesis that the lower bound is equal to zero.

6. Crime and Mismeasured Punishment: Marginal Treatment Effect with Misclassification (Published at the Review of Economics and Statistics  - ArXiv)

I partially identify the marginal treatment effect (MTE) when the treatment is misclassified. I explore two restrictions, allowing for dependence between the instrument and the misclassification decision. If the signs of the propensity scores' derivatives are equal, I identify the MTE sign. If those derivatives are similar, I bound the MTE. To illustrate, I analyze the impact of alternative sentences (fines and community service v. no punishment) on recidivism in Brazil, where Appeals processes generate misclassification. The estimated misclassification bias may be as large as 10% of the largest possible MTE, and the bounds contain the correctly estimated MTE.

This article presents identification results for the marginal treatment effect (MTE) when there is sample selection.  We show that the MTE is partially identified for individuals who are always observed regardless of treatment, and derive uniformly sharp bounds on this parameter under three increasingly restrictive sets of assumptions. The first result imposes standard MTE assumptions with an unrestricted sample selection mechanism. The second set of conditions imposes monotonicity of the sample selection variable with respect to treatment, considerably shrinking the identified set. Finally, we incorporate a stochastic dominance assumption which tightens the lower bound for the MTE. Our analysis extends to discrete instruments. The results rely on a mixture reformulation of the problem where the mixture weights are identified, extending Lee's (2009) trimming procedure to the MTE context. We propose estimators for the bounds derived and use data made available by Deb, Munkin, and Trivedi (2006) to empirically illustrate the usefulness of our approach.

Won the Prize of Best Econometric Article presented at the 42nd Meeting of the Brazilian Econometric Society (2020).

4. Cherry Picking with Synthetic Controls (with Bruno Ferman and Cristine Pinto - Published at the Journal of Policy Analysis and Management)

The synthetic control (SC) method has been recently proposed as an alternative method to estimate treatment effects in comparative case studies. Abadie et al. (2010) and Abadie et al. (2015) argue that one of the advantages of the SC method is that it imposes a data-driven process to select the comparison units, providing more transparency and less discretionary power to the researcher. However, an  important limitation of the SC method is that it does not provide clear guidance on the choice of predictor variables used to estimate the SC weights. We show that such lack of specific guidances provides significant opportunities for the researcher to search for specifications with statistically significant results, undermining one of the main advantages of the method. Considering six alternative specifications commonly used in SC applications, we calculate in Monte Carlo simulations  the probability of finding a statistically significant result at 5% in at least one specification.  We find that this probability can be as high as 13% (23% for a 10% significance test) when there are 12 pre-intervention periods and decay slowly with the number of pre-intervention periods. With 230 pre-intervention periods, this probability is still around 10% (18% for a 10% significance test). We show that the specification that uses the average pre-treatment outcome values to estimate the weights performed particularly bad in our simulations. However, the specification-searching problem remains relevant  even when we do not consider this specification. We also show that this specification-searching problem is relevant in simulations with real datasets looking at placebo interventions in the Current Population Survey (CPS).  In order to mitigate this problem, we propose a criterion to select among SC different specifications based on the prediction error of each specifications in placebo estimations.

Code: Our replication code can be downloaded here.

Recently, the Synthetic Control Estimator was proposed to answer questions involving counterfactuals when only one treated unit and a few control units are observed. Although this method was applied in many empirical works, the formal theory behind its inference procedure is still an open question. In order to fulfill this lacuna, we make clear the sufficient hypotheses that guarantee the adequacy of Fisher's Exact Hypothesis Testing Procedure for panel data, allowing us to test any sharp null hypothesis and, consequently, to propose a new way to estimate Confidence Sets for the Synthetic Control Estimator by inverting a test statistic, the first confidence set when we have access only to finite sample, aggregate level data whose cross-sectional dimension may be larger than its time dimension. Moreover, we analyze the size and the power of the proposed test with a Monte Carlo experiment and find that test statistics that use the synthetic control method outperforms test statistics commonly used in the evaluation literature. We also extend our framework for the cases when we observe more than one outcome of interest (simultaneous hypothesis testing) or more than one treated unit (pooled intervention effect) and when heteroskedasticity is present.

Won the Prize of Best Econometric Article presented at the 37th Meeting of the Brazilian Econometric Society (2015).

Code: We made available an R function and a Stata program to compute the confidence sets proposed in this paper. They can be downloaded here.

I apply the synthetic control method for Brazilian city-level data during the 20th Century in order to evaluate the economic impact of the Free Trade Zone of Manaus (FTZM). I find that this enterprise zone had positive significant effects on real GDP per capita and Services Total Production per capita, but it also had negative significant effects on Agriculture Total Production per capita. My results suggest that this subsidy policy achieved its goal of promoting regional economic growth at the cost of provoking mis-allocation of resources among economic sectors. They also reject the view that an industrialization policy will benefit all economic sectors due to positive spill-overs of the manufacture sector that are strong enough to compensate for the negative effect of the mis-allocation of resources.

O presente estudo objetiva analisar as características da demanda por planos de saúde no Brasil, averiguando os diversos atributos associados aos planos e à percepção dos consumidores sobre a qualidade. Para tal, foi implantada uma análise econométrica para as amostras da PNAD de 1998, 2003 e 2008. Essa base de dados nos permite analisar a evolução temporal da demanda por planos de saúde privados e como se relacionam a percepção dos indivíduos sobre as características do plano de acordo e o valor mensal pago. Os resultados apontam que o aumento no número de brasileiros segurados por planos de saúde privados se deve principalmente à evolução dos indicadores socioeconômicos das famílias. Por fim, há uma correlação positiva entre satisfação e preço dos planos, muito embora, para a maior parte dos atributos analisados, não pareça haver diferença sistemática entre os planos que justifique, por si só, essa correlação. Tal resultado aponta para a existência de possíveis diferenças entre os planos em atributos não mensurados nas bases de dados.