All seminars are on Mondays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 5:30 pm Berlin.
Date: January 26, 2026
Speaker: Madeleine Ransom (joint work with Nicole Menard)
Title: A Dilemma for Skeptics of Trustworthy AI
Abstract: Can AI ever be (un)trustworthy? A growing number of philosophers argue that it cannot, because AI lack some human feature deemed essential for the trust relation, such as moral agency or being responsive to reasons. Here we propose a dilemma for these skeptics. Either such theorists must hold there is either only one kind of trust (monism), or that there are multiple varieties of trust (pluralism). The first horn of the dilemma is that a monistic view of trust is implausible: no one analysis can capture all kinds of trust relationships. The second horn of the dilemma is that if such theorists adopt a pluralistic account of trust they have little reason to deny that AI is the sort of thing that can be trustworthy: while AI may fail to possess characteristics required for some kinds of trust relations, these are not necessary conditions for trustworthiness.
Date: Feb 9, 2026
Speaker: Daniel J Singer and Luca Garzino Demo
Title: The Future of AI is Many, Not One
Abstract: Generative AI is currently being developed and used in a way that is distinctly singular. We see this not just in how users interact with models but also in how models are built, how they're benchmarked, and how commercial and research strategies using AI are defined. We argue that this singular approach is a flawed way to engage with AI if we're hoping for it to support groundbreaking innovation and scientific discovery. Drawing on research in complex systems, organizational behavior, and philosophy of science, we show why we should only expect deep intellectual breakthroughs to come from epistemically diverse teams of AI models, not singular superintelligent models. Having a diverse team broadens the search for solutions, delays premature consensus, and allows for the pursuit of unconventional approaches. Developing AI teams like these directly addresses critics' concerns that current models are constrained by past data and lack the creative insight required for innovation. In the paper, we explain what constitutes genuine diverse teams of AI models, distinguishing it from current multi-agent systems, and outline how to implement meaningful diversity in AI collectives. The upshot, we argue, is that the future of transformative transformer-based AI is fundamentally many, not one.
Date: Feb 23, 2026
Speaker: Huzeyfe Demirtas
Title: (How) Does Accountability Require Explainable AI?
Abstract: Autonomous systems powered by artificial intelligence (AI) are said to generate responsibility gaps (RGs)—cases in which AI causes harm, yet no one is blameworthy. This paper has three aims. First, I argue that we should stop worrying about RGs. This is because, on the most popular contemporary theories, blameworthiness is determined at the development or deployment stage, making post-deployment outcomes irrelevant to blameworthiness. Another upshot of this argument is that questions about blameworthiness do not motivate the demand for explainable AI (XAI). Second, I distinguish blameworthiness from liability and show that blameworthiness is not necessary—nor is it sufficient—for liability. Third, I explore how AI opacity complicates identifying who caused harm—an essential step in assigning liability. However, I argue that identifying who caused the harm—even if we use opaque AI models—is within our reach and not too costly. But liability in the context of AI requires further inquiry, which again suggests that we should stop worrying about RGs and focus on liability. Two further results emerge. One, my discussion presents a framework for analyzing how accountability might require XAI. Two, if my arguments based on this framework are on the right track, XAI is of little significance for accountability. Hence, we should worry about transparency around the AI—its training, deployment, and broader sociopolitical context—not inside the AI.
Date: March 9, 2026
Speaker: Mike Barnes
Title: When Porn Doesn’t Speak: Implications of AI-Assisted Sex Work
Abstract: Adult chatbots take many forms: explicit AI companions, repurposed general-purpose models, or hidden engines powering what users believe are human interactions on platforms like OnlyFans. Beyond their immediate ethical stakes, these technologies reopen a decades-old debate in feminist philosophy about pornography's subordinating function. Drawing on speech act theory, radical feminists have argued that porn speaks—and that its speech acts subordinate women. Pornographic chatbots, which address users through speech, might seem like the perfect application of these arguments. We argue, however, that it is precisely here the argument breaks down. Because AI chatbots rely on user-generated inputs, pornographic content functions more like a dialogue than a monologue, undermining the idea that porn speaks with a unified voice. Moreover, we argue this problem is not new: AI pornography makes explicit what has always been the case—that producers are responsive to consumers in ways the speech act model of subordination struggles to explain. Our aim is not to abandon radical feminism, but to chart its next phase in light of a challenge that new technologies make impossible to ignore.
Date: March 23, 2026
Speaker: Iwan Williams
Title: Intention-like representations in Large Language Models?
Abstract: A growing chorus of AI researchers and philosophers posit internal representations in large language models (LLMs). But how do these representations relate to the kinds of mental states we routinely ascribe to our fellow humans? While some research has focused on belief- or knowledge- like states in LLMs, there has been comparatively little focus on the question of whether LLMs have intentions. I survey five properties that have been associated with intentions in the philosophical literature, and assess two candidate classes of LLM representations against this set of features. The result is mixed: LLMs have representations that are intention-like in many—perhaps surprising—respects, but they differ from human intentions in important ways.
Date: April 6, 2026
Speaker: Jessie Hall
Title: Informant or Information? Competence as a condition for AI testimony
Abstract: TBD
Date: April 20, 2026
Speaker: Parisa Moosavi
Title: Machine Ethics and the Challenge from Moral Particularism
Abstract: Machine Ethics is an area of research at the intersection of philosophy and artificial intelligence that aims to design and develop intelligent machines capable of complying with moral standards. One of the central challenges facing Machine Ethics concerns the difficulty of capturing morally relevant considerations in a form that a machine can reliably follow. Critics such as Purves, Jenkins, and Strawser (2015) appeal to Moral Particularism—the view that morality is a fundamentally particularized or non-principled domain—to argue against the possibility of developing morally compliant AIs. They argue that moral truths cannot be captured in the form of exceptionless general principles and thus cannot be encoded for an algorithmic machine. In this talk, I will examine the prospects of Machine Ethics in addressing this challenge while differentiating between symbolic and connectionist approaches to representing moral standards. I argue that the force of the objection depends on how Moral Particularism is understood. On the most radical version, there is no pattern in the way moral truths connect to the descriptive aspects of the world, which would make it impossible for either symbolic or connectionist AI to learn to comply with morality. In contrast, on more moderate versions of the view, there is such a pattern, but one that is either difficult to articulate via exceptionless general principles or impossible to articulate using our current moral concepts. I argue that moderate versions of moral particularism are compatible with the possibility of modeling morality using both symbolic and connectionist AI. I thus argue that there is no difference between symbolic and connectionist AI in whether they can in principle learn to comply with moral norms. However, I also argue that on the more plausible versions of moral particularism, modeling morality is very difficult to do using either symbolic or connectionist AI alone. Accordingly, to develop AIs that can comply with moral norms, our best bet would be to combine the methods and techniques of both symbolic and connectionist approaches to Machine Ethics.
Date: March 3, 2025
Speaker: Jacqueline Harding
Title: Goal-Directedness in AI Systems
Abstract: Read a recent press release from any AI lab, and you’ll get the same message: 2025 is the year of the AI agent. In the coming months, we’re told, frontier AI systems will not merely generate single steps of dialogue, images or video. Instead, they will act autonomously within complex environments, in the sense that they’ll produce whole sequences of outputs without direct supervision. Crucially, their behaviour will be goal-directed: they will produce outputs in order to achieve some goal. But what does this mean, and why does it matter? In this talk, I’ll first identify some desiderata on an account of goal-directedness. Next, I’ll develop a behavioural account of goal-directedness, drawing on recent work within the computer science literature. The basic idea is that a system’s policy in an environment is goal-directed when it can be compressed by a goal. By applying it to current AI systems, I’ll argue that this behavioural account is surprisingly useful and flexible; in particular, it avoids many of the issues which plagued cybernetic accounts of agency. Nevertheless, it cannot do everything we want an account of goal-directedness to do; amongst other things, it attributes the wrong goals to the system in environments in which the system fails systematically. To deal with these cases, we need to look inside the system, identifying mechanisms whose function is to achieve the goal in question. I’ll conclude by sketching some ways to apply this idea to current and future AI systems.
Date: March 17, 2025
Speaker: Catherine Stinson
Title: Moving Goalposts or Degenerating Research?: AI Benchmarks and their Critics
Abstract: Artificial Intelligence tends to have a dismissive attitude toward its critics. This is true in particular of critique of benchmarks. While critics claim that benchmark datasets are often poorly constructed, that an overemphasis on benchmark leaderboards corrupts research incentives, and that results on benchmark tasks are overgeneralized to broader capacities than they test, the response from some big names in AI is that critics are illegitimately denying AI its successes by ‘moving the goalposts’. However, critique is widely recognized to be essential to progress by historians, sociologists and philosophers of science, and failure to engage with critique is seen as a path to a degenerating research program (or worse, pseudoscience). If that is correct, then AI would do well to heed its critics as helpful voices rather than try to shut down dissent. As an example of how embracing critique can be fruitful, I highlight the case of adversarial examples research, where critique of embarrassing gaffes by image recognition tools (that had better than human performance on benchmark tasks) inspired a research method in which mistakes are explicitly sought out as a way of improving models. This approach, where critique is treated as useful input, has been immensely successful, not only in improving image recognition tools, but also by adding to our knowledge of how primate brains process images. The lesson this case suggests is that AI would benefit from taking a less adversarial stance to its critics.
Date: March 31, 2025
Speaker: Cameron Buckner & Raphaël Milliere
Title: Interventionist methods for interpreting deep neural networks
Abstract: Recent breakthroughs in artificial intelligence have primarily resulted from training deep neural networks (DNNs) with vast numbers of adjustable parameters on enormous datasets. Due to their complex internal structure, DNNs are frequently characterized as inscrutable "black boxes," making it challenging to interpret the mechanisms underlying their impressive performance. This opacity creates difficulties for explanation, safety assurance, trustworthiness, and comparisons to human cognition, leading to divergent perspectives on these systems. This chapter examines recent developments in interpretability methods for DNNs, with a focus on interventionist approaches inspired by causal explanation in philosophy of science. We argue that these methods offer a promising avenue for understanding how DNNs process information compared to merely behavioral benchmarking and correlational probing. We review key interventionist methods and illustrate their application through practical case studies. These methods allow researchers to identify and manipulate specific computational components within DNNs, providing insights into their causal structure and internal representations. We situate these approaches within the broader framework of causal abstraction, which aims to align low-level neural computations with high-level interpretable models. While acknowledging current limitations, we contend that interventionist methods offer a path towards more rigorous and theoretically grounded interpretability research, potentially informing both AI development and computational cognitive neuroscience.
Date: April 14, 2025
Speaker: Alexandra Oprea (paper co-authored with Ryan Muldoon and Justin Bruner)
Title: Pluralism and AI Alignment
Abstract: AI researchers and policymakers agree that powerful new AI technologies ought to be aligned with human values and ought to serve the public good. Increasingly, they also agree that such alignment ought to be pluralistic. Our analysis of existing methods of AI alignment such as reinforcement learning from human feedback (RLHF) and reinforcement learning from AI feedback (RLAIF) identifies three key challenges for developing a pluralistic model of AI alignment. The first is the selection challenge of recruiting a diverse group of participants to provide the relevant feedback and/or generating a sufficiently diverse range of answers underpinned by pluralistic moral and political views. The second is the incentive challenge of structuring the incentive system so that participants providing feedback aim for reasonable diversity instead of mirroring the preferences of AI programmers. The final challenge is the aggregation challenge of preserving pluralism while turning feedback into reward functions. In particular, the goal is to avoid treating diverse answers as noise and to instead treat it as the pluralistic signal it represents. Drawing on existing work in political philosophy, game theory, and business ethics, we attempt to sketch an integrated solution to these challenges that can advance the goal of pluralistic AI alignment.
Date: April 28, 2025
Speaker: Emily Sullivan
Title: Idealization Failure in ML
Abstract: Idealizations, deliberate distortions introduced into scientific theories and models, are commonplace in science. This has led to a puzzle in epistemology and philosophy of science: How could a deliberately false claim or representation lead to the epistemic successes of science? In answering this question philosophers have been single-focused on explaining how and why idealizations are successful. But surely some idealizations fail. I propose that if we ask a slightly different question, whether a particular idealization is successful, then that not only gives insight into idealization failure, but will make us realize that our theories of idealization need revision. In this talk I consider idealizations in computation and machine learning.