Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023
With the modern software and online platforms to collect massive amount of data, there is an increasing demand of applying causal inference methods at large scale when randomized experimentation is not viable. Weighting methods that directly incorporate covariate balancing have recently gained popularity for estimating causal effects in observational studies. These methods reduce the manual efforts required by researchers to iterate between propensity score modeling and balance checking until a satisfied covariate balance result. However, conventional solvers for determining weights lack the scalability to apply such methods on large scale datasets in companies like Snap Inc. To address the limitations and improve computational efficiency, in this paper we present scalable algorithms, DistEB and DistMS, for two balancing approaches: entropy balancing and MicroSynth. The solvers have linear time complexity and can be conveniently implemented in distributed computing frameworks such as Spark, Hive, etc. We study the properties of balancing approaches at different scales up to 1 million treated units and 487 covariates. We find that with larger sample size, both bias and variance in the causal effect estimation are significantly reduced. The results emphasize the importance of applying balancing approaches on large scale datasets. We combine the balancing approach with a synthetic control framework and deploy an end-to-end system for causal impact estimation at Snap Inc.
Consumers benefit from reading ratings online before making their purchases, yet this information aggregation process may have some potential problems that were not previously credited in the literature. Through an empirical approach, I show how people could review businesses inconsistently when their expectations are formed by ratings on crowd-sourced review websites. Using data from Yelp, I tested how potential disappointments may affect customers' reviews by applying a regression discontinuity design to control for unobserved factors that may also simultaneously influence ratings. In addition, I developed a model illustrating rating behaviors with reference-dependent utilities to establish testable hypotheses and showed that comparisons between their true experience and expectation, when consumers write their reviews, could impede their assessment of businesses' qualities and cause attribution bias. After carefully excluding confounding factors, my results support the hypothesis that consumers have attribution bias when they write reviews. Several robustness checks support these findings and shed further light onto this example of attribution bias. This paper links to an emerging literature of attribution bias in economics and provides empirical evidence and implications of attribution bias on online reputation systems.
This paper incorporates applied econometrics, causal machine learning and theories of reference-dependent preferences to test whether consuming in a restaurant on special occasions, such as one's birthday, anniversary, graduation, etc., would raise one's expectations of the restaurant and would increase consumers' tendency to rate their consumption experiences lower. Furthermore, our study is closely linked to the emerging literature of attribution bias in economics and psychology and provides a scenario in which we can empirically test two leading theories of attribution bias. In our paper, we analyzed reviews from Yelp and combined the text analyses with regressions, matching techniques and causal machine learning. Through a series of models, we found evidence that consumers' ratings for restaurants are indeed lower when they go to restaurants on special occasions. This result can be explained by one theory of attribution bias according to which people have higher expectations about restaurants on special occasions and then misattribute their disappointment to the quality of the restaurants. From the connection between our empirical analysis and theories of attribution bias, this paper provides evidence of how attribution bias influences people's perceptions and behaviors.
Financial derivatives and interest rates correlate strongly with United States government bonds. Among many characteristics of government bonds, the term structure or the so-called yield curve is one of the main targets that investors always attempt to forecast. In this paper, I construct a model with autoencoder structures and recurrent neural networks (RNN) and focus on the point forecasting of the yield curve to explore the possibility of having a better forecast for the term structure. In addition, the similarities between RNN and the state-space models allow me to show that the newly proposed neural-network method is closely linked with previous financial econometric forecasting literature and can be considered as a generalization of the dynamic Nelson-Siegel method (Diebold and Li, 2006). While allowing similar interpretation as previous econometric methods, the neural network model in this paper shows better forecasting accuracy.
"An Elicitation Horse Race (Where the Blinkered Horse Win by a Nose)" (with David Danz, Lise Vesterlund, Alistair Wilson, Prottoy Aman Akbar, Ying Kai Huang, and Tianyi Wang)