* Denotes equal contribution
Data fusion combines multiple datasets to improve inference, but can inadvertently reveal user identities, even when data appear anonymous. We introduce a Privacy Preserving Data Fusion (PPDF) methodology, that does not require overlapping users across sources and corrects for missingness in certain conditions. Additionally, to aid managerial interpretability, we develop a posterior reidentification metric to quantify individual-level reidentification risks under data fusion and formally derive the analytical bound to the heightened privacy risks posed by data fusion. We showcase PPDF’s abilities by fusing an anonymous customer satisfaction survey and a CRM database of a large U.S. telecom provider. In a predictive churn-prevention campaign, PPDF achieves a 1.46% campaign lift without revealing any user identities, compared to 1.66% lift in a model without privacy guarantees, in which nearly 7% of users are reidentified. A heterogeneity analysis further reveals these individuals as outlier customers who exhibit persistently higher risk of reidentification, underscoring the nuanced trade-off between individual-level privacy guarantees and overall inference quality. More broadly, PPDF enables organizations to harness the full power of data fusion without violating user privacy - offering a practical solution for firms, governments, and researchers seeking to extract value from sensitive data while ensuring individual confidentiality.
As the quantity and value of data increase, so do the severity of data breaches and customer privacy invasions. Although firms typically publicize their post hoc protective actions, little is known about the aftereffects of major breaches on users’ behaviors; do they alter their interactions with the firm, continue “business as usual,” or do something more subtle? We explore these questions in the context of a severe data breach to a matchmaking website for those seeking an (extramarital) affair. A challenge to measuring “treatment effects” for a massive and highly publicized breach is the lack of an obvious control group. To resolve this problem, we propose Temporal Causal Inference (TCI); each group of users who joined during a given time window is matched with an appropriate (control) group of users who had joined prior to it, helping to account for “usage trajectories” in both individual and temporal site behavior. Following the creation of the control groups, we adapt Causal Forests ( Athey et al. 2019 ) into Temporal Causal Forests (TCF). TCF allows for insights regarding both average and individual-level treatment (data breach) effects as well as both demographic and usage-based covariates that align with them. Our analyses reveal a decrease in the probability of searching and messaging on the website and a notable increase in the probability of deleting photos, the primary avenue for avoiding further personal identification. Moreover, these effects are broadly robust to a variety of causal inference methodologies, both with and without TCI or Causal Forests. Intriguingly, these initially negative reaction(s) did not persist; by the third week after the announcement, there were hints of “life returns to normal.” Despite the specificity of the setting, our analysis suggests both managerial and policy imperatives to help protect customers’ privacy.
Early in the new coronavirus disease (COVID-19) pandemic, scholars and journalists noted partisan differences in behaviors, attitudes, and beliefs. Based on location data from a large sample of smartphones, as well as 13,334 responses to a proprietary survey spanning 10 months from April 1, 2020 to February 15, 2021, we document that the partisan gap has persisted over time and that the lack of convergence occurs even among individuals who were at heightened risk of death. Our results point to the existence and persistence of the interaction of partisanship and information acquisition and highlight the need for mandates and targeted informational campaigns towards those with high health risks.
Mobile Health (mHealth) apps, such as COVID-19 contact tracing and other health-promoting technologies, help support personal and public health efforts in response to the pandemic and other health concerns. However, due to the sensitive data handled by mHealth apps, and their potential effect on people's lives, their widespread adoption demands trust in a multitude of aspects of their design. In this work, we report on a series of conjoint analyses (N = 1,521) to investigate how COVID-19 contact tracing apps can be better designed and marketed to improve adoption. Specifically, with a novel design of randomization on top of a conjoint analysis, we investigate people's privacy considerations relative to other attributes when they are contemplating contact-tracing app adoption. We further explore how their adoption considerations are influenced by deployment factors such as offering extrinsic incentives (money, healthcare) and user factors such as receptiveness to contact-tracing apps and sociodemographics. Our results, which we contextualize and synthesize with prior work, offer insight into the most desired digital contact-tracing products (e.g., app features) and how they should be deployed (e.g., with incentives) and targeted to different user groups who have heterogeneous preferences.
COVID-19 exposure-notification apps have struggled to gain adoption. Existing literature posits as potential causes of this low adoption: privacy concerns, insufficient data transparency, and the type of appeal used to pitch the pro-social behavior of installing the app. In a field experiment, we advertised CovidDefense, Louisiana's COVID-19 exposure-notification app, at the time it was released. We find that all three hypothesized factors - privacy, data transparency, and appeals framing - relate to app adoption, even when controlling for age, gender, and community density. Specifically, we find that collective-good appeals are effective in fostering pro-social COVID-19 app behavior in the field. Our results empirically support existing policy guidance on the use of collective-good appeals and offer real-world evidence in the on-going debate on the efficacy of such appeals. Further, we offer nuanced findings regarding the efficacy of transparency - about both privacy and data collection - in encouraging health technology adoption and pro-social COVID-19 behavior. Our results may aid in fostering pro-social public-health-related behavior and for the broader debate regarding privacy and data transparency in digital healthcare.
Nowadays, most of our activities and personal details are recorded by one entity or another. These data are used for many applications that fundamentally enrich our lives, such as navigation systems, social networks, search engines, and health monitoring. On the darker side of data collection lie usages that can harm us and threaten our sense of privacy. Marketing, as an academic field and corporate practice, has benefited tremendously from this era of data abundance, but has concurrently heightened the risk of associated harms.
In this paper, we discuss both the great advantages and potential harms ushered in by this era of data collection, as well as ways to mitigate the harms while maintaining the benefits. Specifically, we propose and discuss classes of potential solutions: methods for collecting less data overall, transparency of code and models, federated learning, identity management tools, among others. Some of these solutions can be implemented now, others require a longer horizon, but all can begin through the advocacy of Marketing Research. We also discuss possible ways to improve on the benefits of data collection – by developing methods to assist individuals pursue their long-term goals while advocating for privacy in such pursuits.
The rise of digital trading platforms has transformed retail investing by enabling individuals not only to trade independently but also to observe and copy the investment behavior of others. In this study, we examine how investors dynamically adjust their reliance on others' behaviors-measured as the share of an investor's portfolio allocated to copying others' investments-in response to gains and losses from both self-directed investments and those directed by others.
Using a stylized 2×2 design and longitudinal data from over 25,000 investors on a social trading platform, we find strong asymmetries in reliance adjustments in response to performance outcomes. Investors reduce reliance on others after gains from self-directed trades but do not increase it after losses. In contrast, they sharply reduce reliance following losses on copied trades but show no change after gains. These findings extend the theory of self-serving bias to the evaluation of others' performance in a real-world financial setting, demonstrating for the first time that individuals attribute others' success to external factors and others' failures to their (in)abilities.
Counterfactual analysis insinuate this bias leads to suboptimal decisions; we found that investors would have earned higher returns by maintaining, rather than reducing, reliance. Heterogeneity analyses further show that experienced investors respond less strongly to this bias than novices and that women are less affected by performance feedback than men. Our findings provide initial evidence on how cognitive biases shape social reliance in digital finance platforms, with implications for platform design and digital trading.