Evaluating Architectural Innovations: How Expertise Heterogeneity Shapes Technological Advances (with Z. Szajnfarber, J. Crusan, and M. Menietti). Under 2nd review.
As organizations pursue high-impact technological innovations, the challenge of effectively evaluating novel yet feasible solutions becomes critical. Through a field experiment at NASA, this study examines how different types of expertise shape the evaluation of technological innovations. Using a novel human-LLM hybrid approach to analyze evaluator comments, the study reveals that evaluators with cross-domain expertise are uniquely positioned to recognize how novel solutions can enhance system functionality, while specialists tend to assess novelty and feasibility in isolation. These findings provide insights for organizations seeking to optimize their innovation evaluation processes through strategic combinations of expertise.
Keywords: project evaluation, system architecture, organizational decision-making, human-LLM annotation
Greenlighting Innovative Projects: How Evaluation Format Shapes the Perceived Feasibility of Early-Stage Ideas (with S. Friis, T. Cai, M. Menietti, G. Webber, and E. Guinan). 1st revise & resubmit.
As organizations evaluate high-risk, innovative projects, they face a critical tension: the need to assess multiple dimensions like impact and novelty while thoroughly examining implementation challenges. Through a field experiment at a leading research university, we investigate how different evaluation formats affect assessors' ability to identify potential feasibility issues in grant proposals. Using a novel human-LLM hybrid approach, the study reveals that focusing evaluators' attention on feasibility alone leads to more comprehensive assessment of implementation challenges, while multi-criteria evaluation better captures interdependencies across dimensions. These findings provide insights for organizations seeking to optimize their innovation funding processes to ensure thorough evaluation of high-potential projects.
Keywords: project evaluation, cognitive attention, feasibility analysis, human-LLM annotation
Judging the Problem: A Problem-Centric Approach to Evaluating Early-Stage Ventures (with M. Zhang). In-progress research.
Early-stage ventures often struggle to articulate the fundamental problems they aim to solve, yet traditional evaluation approaches may not effectively surface these issues. Through a field experiment with over 200 expert judges evaluating 150 ventures at a leading university's accelerator program, we examine how training judges to focus on problem identification shapes their assessments and feedback quality. Using both quantitative analysis and qualitative feedback from ventures, this study aims to understand whether a problem-focused approach helps judges better identify promising ventures and provide more actionable guidance. The findings will offer insights for accelerators and competitions seeking to optimize their evaluation processes and enhance the value of feedback provided to entrepreneurs.
Keywords: venture evaluation, problem identification, entrepreneurship, feedback quality, cognitive attention
Decomposing Venture Evaluation: A Field Experiment on Judge Focus and Startup Assessment (with N. Rietzler and Y. Zhang). In-progress research.
Early-stage venture evaluators face the complex challenge of assessing multiple dimensions simultaneously, often defaulting to simplified heuristics that may overlook critical aspects of promising ventures. While prior research suggests venture capitalists prioritize team evaluation, we know little about how different evaluation approaches shape judges' assessments and impact venture outcomes in accelerator settings. First, using a human-LLM approach with fine-tuning, we analyze over 80,000 evaluator comments from a leading startup accelerator to uncover the mental models and cognitive structures that guide evaluation decisions. Building on these insights, we conduct a field experiment to examine how variations in evaluation structure affect judges' decision-making and feedback quality. Our findings reveal how different evaluation approaches influence both the depth of assessment and the actionability of feedback provided to ventures, offering practical guidance for accelerators seeking to optimize their judging processes.
Keywords: project evaluation, mental models, cognitive attention, entrepreneurship, human-LLM annotation
The Narrative AI Advantage? A Field Experiment on Generative AI-Augmented Evaluations of Early-Stage Innovations (with L. Boussioux, C. Ayoubi, Y. Chen, C. Lin, R. Spens, P. Wagh, and P. Wang ). 1st revise and resubmit.
As generative AI transforms creative problem-solving, organizations face a critical challenge: how to effectively combine human judgment with AI insights in evaluation processes. Through a field experiment with MIT Solve's Global Health Equity Challenge, we examine how different AI assistance formats—from black box recommendations to narrative explanations—shape human evaluators' decision-making. Using a novel combination of behavioral data and mouse tracking analysis, the study reveals that while AI can effectively standardize objective assessments, the way AI presents its recommendations significantly influences how humans engage with subjective criteria. These findings provide insights for organizations seeking to optimize human-AI collaboration in complex evaluation tasks, particularly where both quantifiable metrics and nuanced judgment are essential.
Keywords: project evaluation, human-AI collaboration, generative AI narratives
Augmenting Expert Evaluation: Design Principles for Human-AI Collaboration. In-progress research.
Organizations increasingly face complex evaluation challenges that require integrating multiple types of expertise and managing cognitive load effectively. Collectively, my research reveals that how we structure evaluation processes—whether through focused attention on specific dimensions or strategic combinations of expertise—significantly impacts decision quality. Building on these insights, we examine how to design human-AI evaluation frameworks that complement human judgment while preserving evaluator autonomy. Through field experiments across various innovation contexts, we investigate how different AI assistance formats can help evaluators better manage cognitive load and surface critical insights, particularly when decisions require integrating multiple types of expertise. This research aims to provide guidance for organizations seeking to enhance complex evaluation processes through the strategic integration of human expertise and AI capabilities.
Keywords: evaluation design, human-AI collaboration, AI augmentation
The rapid advances in generative artificial intelligence (AI) open up attractive opportunities for creative problem-solving through human-guided AI partnerships. To explore this potential, we initiated a crowdsourcing challenge focused on sustainable, circular economy business ideas generated by the human crowd (HC) and collaborative human-AI efforts using two alternative forms of solution search. The challenge attracted 125 global solvers from various industries, and we used strategic prompt engineering to generate the human-AI solutions. We recruited 300 external human evaluators to judge a randomized selection of 13 out of 234 solutions, totaling 3,900 evaluator-solution pairs. Our results indicate that while human crowd solutions exhibited higher novelty—both on average and for highly novel outcomes—human-AI solutions demonstrated superior strategic viability, financial and environmental value, and overall quality. Notably, human-AI solutions cocreated through differentiated search, where human-guided prompts instructed the large language model to sequentially generate outputs distinct from previous iterations, outperformed solutions generated through independent search. By incorporating “AI in the loop” into human-centered creative problem-solving, our study demonstrates a scalable, cost-effective approach to augment the early innovation phases and lays the groundwork for investigating how integrating human-AI solution search processes can drive more impactful innovations.
Competence development in digital technologies, analytics, and artificial intelligence is increasingly important to all types of organizations and their workforce. Universities and corporations are investing heavily in developing training programs, at all tenure levels, to meet the new skills needs. However, there is a risk that the new set of lucrative opportunities for employees in these tech-heavy fields will be biased against diverse demographic groups like women. Although much research has examined the experiences of women in science, technology, engineering, and mathematics (STEM) fields and occupations, less understood is the extent to which gender stereotypes influence recruiters’ perceptions and evaluations of individuals who are deciding whether to apply to STEM training programs. These behaviors are typically unobserved because they occur prior to the application interface. We address this question by investigating recruiters’ initial outreach decisions to more than 166,000 prospective students who have expressed interest in applying to a midcareer level online tech training program in business analytics. Using data on the recruiters’ communications, our results indicate that recruiters are less likely to initiate contact with female than male prospects and search for additional signals of quality from female prospects before contacting them. We also find evidence that recruiters are more likely to base initial outreach activities on prospect gender when they have higher workloads and limited attention. We conclude with a discussion of the implications of this research for our understanding of how screening and selection decisions prior to the application interface may undermine organizational efforts to achieve gender equality and diversity as well as the potential for demand-side interventions to mitigate these gender disparities.
In their Discussion Paper, Franzoni and Stephan (F&S, 2023) discuss the shortcomings of existing peer review models in shaping the funding of risky science. Their discussion offers a conceptual framework for incorporating risk into peer review models of research proposals by leveraging the Subjective Expected Utility (SEU) approach to decouple reviewers' assessments of a project's potential value from its risk. In my Response, I build on F&S's discussion and attempt to shed light on three additional yet core considerations of risk in science: 1) how risk and reward in science are related to assessments of a project's novelty and feasibility; 2) how the sunk cost literature can help articulate why reviewers tend to perceive new research areas as riskier than continued investigation of existing lines of research; and 3) how drawing on different types of expert reviewers (i.e., based on domain and technical expertise) can result in alternative evaluation assessments to better inform resource allocation decisions. The spirit of my Response is to sharpen our understanding of risk in science and to offer insights on how future theoretical and empirical work—leveraging experiments— can test and validate the SEU approach for the purposes of funding more risky science that advances the knowledge frontier.
The evaluation and selection of novel projects lies at the heart of scientific and technological innovation, and yet there are persistent concerns about bias, such as conservatism. This paper investigates the role that the format of evaluation, specifically information sharing among expert evaluators, plays in generating conservative decisions. We executed two field experiments in two separate grant-funding opportunities at a leading research university, mobilizing 369 evaluators from seven universities to evaluate 97 projects, resulting in 761 proposal-evaluation pairs and more than $250,000 in awards. We exogenously varied the relative valence (positive and negative) of others’ scores and measured how exposures to higher and lower scores affect the focal evaluator’s propensity to change their initial score. We found causal evidence of a negativity bias, where evaluators lower their scores by more points after seeing scores more critical than their own rather than raise them after seeing more favorable scores. Qualitative coding of the evaluators’ justifications for score changes reveals that exposures to lower scores were associated with greater attention to uncovering weaknesses, whereas exposures to neutral or higher scores were associated with increased emphasis on nonevaluation criteria, such as confidence in one’s judgment. The greater power of negative information suggests that information sharing among expert evaluators can lead to more conservative allocation decisions that favor protecting against failure rather than maximizing success.
We investigate how knowledge similarity between two individuals is systematically related to the likelihood that a serendipitous encounter results in knowledge production. We conduct a field experiment at a medical research symposium, where we exogenously varied opportunities for face-to-face encounters among 15,817 scientist-pairs. Our data include direct observations of interaction patterns collected using sociometric badges, and detailed, longitudinal data of the scientists' postsymposium publication records over 6 years. We find that interacting scientists acquire more knowledge and coauthor 1.2 more papers when they share some overlapping interests, but cite each other's work between three and seven times less when they are from the same field. Our findings reveal both collaborative and competitive effects of knowledge similarity on knowledge production outcomes.