Daniel Yue

PhD Candidate, Harvard Business School

My research explores why firms openly share innovative knowledge without directly profiting, a strategy called “open disclosure.” My projects use scientific publications and open source software in AI research as an empirical setting to develop and test new theories. I am fortunate to be advised by Shane Greenstein (Chair), Karim Lakhani, and Frank Nagle. 

This summer (2024), I will join Georgia Tech Scheller as an Assistant Professor of Information Technology Management.

Before my PhD, I completed my undergrad in Physics at Harvard College and worked as a product manager of analytics software at Mastercard. Please find my CV here

Research

I, Google: Estimating the Impact of Corporate Involvement on AI Research
(solo-authored, job market paper) - working paper

Abstract.  While corporate involvement in modern scientific research is an indisputable fact, the impact of corporate involvement on scientific progress is controversial. Corporate interests can lead to constraints that redirect research activities into applied problems in a way that benefits the company but reduces scientific impact. However, corporations also provide resources such as funding, data sets, collaborators, engineers, and technical problems that researchers may otherwise be unable to access or know about, spurring knowledge creation. This paper empirically assesses the impact of corporate involvement on scientific research by focusing on dual-affiliated artificial intelligence researchers located at the intersection of academia and industry. After controlling for the researcher's quality and topic preferences, I find that corporate involvement leads to up to a 44% increase in field-weighted citations received by a paper. I document evidence that this effect is driven by the resource-constraint tradeoff. Specifically, I show that corporate involvement significantly increases the likelihood of a breakthrough paper and that these effects are magnified by the involvement of firms with greater resources. However, corporate involvement also alters the direction of the dual-affiliate author's research to be more aligned with the firm's commercial interests. This is the first large-scale quantitative study of any field of science to demonstrate a direct positive effect of corporate involvement on science or to describe the underlying mechanism.

Nailing Prediction: Experimental Evidence on the Impact of Tools in Predictive Model Development
(with Paul Hamilton and Iavor Bojinov) - working paper, R&R at Management Science

Predictive model development is understudied despite its centrality in modern artificial intelligence and machine learning business applications. Although prior discussions highlight advances in methods (along the dimensions of data, computing power, and algorithms) as the primary driver of model quality, the tools that implement those methods have been neglected. In a field experiment leveraging a predictive data science contest, we study the impact of tools by restricting access to software libraries for machine learning models. By only allowing access to these libraries in our control group, we find that teams with unrestricted access perform 30% better in log-loss error — a statistically and economically significant amount, equivalent to a 10-fold increase in the training data set size. We further find that teams with high general data-science skills are less affected by the intervention. In contrast, teams with high tool-specific skills significantly benefit from access to modeling libraries. Our findings are consistent with a mechanism we call `Tools-as-Skill,' where tools automate and abstract some general data science skills but, in doing so, create the need for new tool-specific skills. 

How Open Source Machine Learning Software Shapes AI

(with Max Langenkamp) - published in Artificial Intelligence, Ethics, and Society (2022)

If we want a future where AI serves a plurality of interests, then we should pay attention to the factors that drive its success. While others have studied the importance of data, hardware, and models in directing the trajectory of AI, we argue that open source software is a neglected factor shaping AI as a discipline. We start with the observation that almost all AI research and applications are built on machine learning open source software (MLOSS). This paper presents three contributions. First, it quantifies the outsized impact of MLOSS by using Github contributions data. By contrasting the costs of MLOSS and its economic benefits, we find that the average dollar of MLOSS investment corresponds to at least $100 of global economic value created, corresponding to $30B of economic value created this year. Second, we leverage interviews with AI researchers and developers to develop a causal model of the effect of open sourcing on economic value. We argue that open sourcing creates value through three primary mechanisms: standardization of MLOSS tools, increased experimentation in AI research, and creation of communities. Finally, we consider the incentives for developing MLOSS and the broader implications of these effects. We intend this paper to be useful for technologists and academics who want to analyze and critique AI, and policymakers who want to better understand and regulate AI systems.