Schedule

Schedule

Day 1, 29 July

08:45-09:25   Walk in, Registration and Coffee
09:30-09:45   Opening Remarks
09:45-10:30   Talk 1: Iryna Gurevych
10:30-11:00   Talk 2: Frank Hutter (virtual)
11:00-11:30   Coffee Break
12:00-12:45   Talk 3: Sasha Rush
13:00-14:00   Lunch (served at Keble College)
14:00-14:45   Talk 5: Serena Booth
14:45-15:30   Talk 6: Subbarao Kambhampati
15:30-16:00   Talk 7: Sharon Li (virtual)
16:00-17:00   Poster Session & Coffee break
17:00-17:45   Talk 4: Jonas Geiping
17:45-18:45   Walk in or around Oxford / Free time 


19:00-19:30   Drinks Reception
19:30-21:00   Dinner at Keble College  


Day 2, 30 July

08:45-09:00   Walk in, registration
09:00-09:45   Talk 8: Beyza Ermiş
09:45-10:30   Talk 9: Roberta Raileanu
10:30-11:00   Coffee Break
11:00-11:30   Talk 10: Tim G. J. Rudner (virtual)
11:30-12:15   Talk 11: Daniel Johnson
12:15-13:00   Talk 12: Yarin Gal
13:00-14:00   Lunch (served at Keble College)
14:15-15:15   Q&A session
15:15-15:30   Close

Talks

Alexander (Sasha) Rush: LMs Inside-Out

This talk will discuss how the learned representations of language models (LMs) can be used today both to expose essential information and to improve their robustness. In the first part I will discuss a series of works on inversion methods for LMs. In the second part I will discuss research on improving the robustness of code generation using internal representations.

Bio: Alexander "Sasha" Rush is an Associate Professor at Cornell Tech and a researcher at Hugging Face. His research interest is in the study of language models with applications in controllable text generation, efficient inference, and applications in summarization and information extraction. In addition to research, he has written several popular open-source software projects supporting NLP research, programming for deep learning, and virtual academic conferences.

Beyza Ermiş: Aligning AI Safety Measures Across Multilingual Contexts

Ensuring the safety and ethical alignment of AI systems is a critical challenge, particularly when considering the diverse global landscape. Traditional safety measures often rely on monolingual, Western-centric frameworks, which may not effectively address the nuances of different cultural and linguistic contexts. In this talk, we introduce a novel approach to AI alignment that integrates both global and local preferences, reducing potential harms across various languages and cultures. We present a new dataset of red-teaming prompts in multiple languages, demonstrating the effectiveness of our multilingual alignment strategy. Our findings highlight the importance of considering diverse perspectives in AI safety and offer a robust framework for developing more inclusive and universally applicable AI systems.

Daniel Johnson: Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Identifying how much a model p_θ(Y|X) knows about the stochastic real-world process p(Y|X) it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. In this talk, I'll discuss a general strategy for teaching a model to both approximate p(Y|X) and also estimate the remaining gaps between p_θ(Y|X) and p(Y|X): train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, in recent work we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for p(Y|X) and detect incorrect responses with high probability. We also demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.


Bio: Daniel Johnson is a PhD student at the University of Toronto working with David Duvenaud and Chris Maddison, and a research scientist at Google Deepmind. He is interested in understanding what neural networks know, and ensuring that they act in predictable, interpretable, and safe ways in the presence of uncertainty. He is also interested in exploring the connections between computation, intelligent behavior, and probabilistic reasoning, especially under capacity or memory constraints.

Frank Hutter: In-context-learning (ICL) in the tabular foundation model TabPFN -- a drosophila for understanding ICL in LLMs?

I will first present TabPFN, a trained Transformer that uses in-context-learning (ICL) to solve small tabular supervised ML problems. TabPFN needs no hyperparameter tuning and for small data sets outperforms SOTA tabular ML methods, such as gradient boosting. TabPFN is fully entailed in the weights of our network, which accepts training and test samples as a set-valued input and yields predictions for the entire test set in a single forward pass. TabPFN is a Prior-Data Fitted Network (PFN) and is trained to approximate Bayesian inference on synthetic datasets drawn from our prior. Changing TabPFN's prior leads to a different trained algorithm. Studying how TabPFN learns the corresponding approximate posterior for different piors might thus provide the opportunity to function as a drosophila to understand the same in-context-learning mechanisms at work in LLMs in a highly controlled environment.

Iryna Gurevych:  Towards Real-World Fact-Checking with Large Language Models

Misinformation poses a growing threat to our society. It has a severe impact on public health by promoting fake cures or vaccine hesitancy, and it is used as a weapon during military conflicts to spread fear and distrust. Current research on natural language processing (NLP)  for fact-checking focuses on identifying evidence and predicting the veracity of a claim. People’s beliefs, however, often do not depend on the claim and the rational reasoning but on credible content that makes the claim seem more reliable, such as scientific publications or visual content that was manipulated or stems from unrelated contexts. To combat misinformation, we need to show (1) “Why was the claim believed to be true?”, (2) “Why is the claim false?”, (3) “Why is the alternative explanation correct?”. In this talk, I will zoom into two critical aspects of such misinformation supported by credible though misleading content. Firstly, I will present our efforts to dismantle misleading narratives based on fallacious interpretations of scientific publications. Secondly, I will show how we can use multimodal large language models to (1) detect misinformation based on visual content and (2) provide strong alternative explanations for the visual content.


Bio: Iryna Gurevych is a Full Professor at the Computer Science Department of the Technical University Darmstadt, Germany and head of the UKP Lab. She has a strong background in information extraction, semantic text processing, machine learning and innovative applications of NLP to social sciences and humanities. She has received an ERC Advanced Grant for the project “InterText – Modeling Text as a Living Object in a Cross-Document Context”.  Iryna has served as President of the Association for Computational Linguistics (ACL) and is an ACL fellow since 2020. She is also an ELLIS fellow and a co-director of the ELLIS NLP Program. 

Jonas Geiping: When Do Adversarial Attacks against Language Models Matter?

Adversarial attacks can be optimized to attack large language model applications, such as conversational chatbots. This is often used to "jailbreak" these models, that is to circumvent the post-training modifications made to the model to increase the safety of its answers. However, this is practically not a threat with current-generation language models. Yet, as soon as these models are used for any task that goes beyond chatting and simulating text, practical security problems arise.


Bio: Jonas Geiping received his M.Sc. in Mathematics from the University of Münster in 2016 and his PhD in Computer Science from the University of Siegen in 2021. After a postdoc stay at the University of Maryland, College Park, he has recently started a research group at the ELLIS Institute Tübingen and the Max Planck Institute for Intelligent Systems, focusing on the interplay of safety and efficiency in modern machine learning. 

Roberta Raileanu: Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel approach for producing a diverse collection of adversarial prompts. Rainbow Teaming casts adversarial prompt generation as a quality-diversity problem, and uses open-ended search to generate prompts that are both effective and diverse. It can uncover a model's vulnerabilities across a broad range of domains including, in this paper, safety, question answering, and cybersecurity. We also demonstrate that fine-tuning on synthetic data generated by Rainbow Teaming improves the safety of state-of-the-art LLMs without hurting their general capabilities and helpfulness, paving the path to open-ended self-improvement.

Bio: Roberta Raileanu is a Research Scientist at Meta and an Honorary Lecturer at UCL. She earned her PhD in Computer Science from NYU where she worked on generalization in deep reinforcement learning. Roberta also holds a degree in Astrophysics from Princeton University. Roberta's research focuses on designing machine learning algorithms that can make robust sequential decisions in complex environments while acquiring new skills and knowledge. Her work draws from multiple fields such as reinforcement learning, open-ended learning, and self-supervised learning. Currently, she works on augmenting foundation models with planning, reasoning and decision making abilities by training them from feedback and interaction with external tools, environments, humans, and other AI agents. 

Serena Booth: Building Human-AI Alignment: Specifying, Inspecting, and Modeling AI Behaviors

The learned behaviors of AI and robot agents should align with the intentions of their human designers. Toward this goal, people must be able to easily specify, inspect, and model agent behaviors. For specifications, we will consider expert-written reward functions for reinforcement learning (RL) and non-expert preferences for reinforcement learning from human feedback (RLHF). I will show evidence that experts are bad at writing reward functions: even in a trivial setting, experts write specifications that are overfit to a particular RL algorithm, and they often write erroneous specifications for agents that fail to encode their true intent. Next, I will show that the common approach to learning a reward function from non-experts in RLHF uses an inductive bias that fails to encode how humans express preferences, and that our proposed bias better encodes human preferences both theoretically and empirically. For inspection, humans must be able to assess the behaviors an agent learns from a given specification. I will discuss a method to find settings that exhibit particular behaviors, like out-of-distribution failures. Lastly, cognitive science theories attempt to show how people build conceptual models that explain agent behaviors. I will show evidence that some of these theories are used in research to support humans, but that we can still build better curricula for modeling. Collectively, my research provides evidence that—even with the best of intentions— current human-AI systems often fail to induce alignment, and my research proposes promising directions for how to build better aligned human-AI systems.

Bio: Serena Booth is an incoming Assistant Professor in Computer Science at Brown University. Serena currently works in the U.S. Senate as a AAAS AI Policy Fellow, where she is working on AI policy questions for the Senate Banking, Housing, and Urban Affairs Committee. Serena received her PhD at MIT CSAIL in 2023. Serena studies how people write specifications for AI systems and how people assess whether AI systems are successful in learning from specifications. While at MIT, Serena served as an inaugural Social and Ethical Responsible Computing Scholar, teaching AI Ethics and developing MIT’s AI ethics curriculum that is also released on MIT OpenCourseWare. Serena is a graduate of Harvard College (2016), after which she worked as an Associate Product Manager at Google to help scale Google’s ARCore augmented reality product to 100 million devices. Her research has been supported by an MIT Presidential Fellowship and by an NSF GRFP. She is a Rising Star in EECS and an HRI Pioneer.

Sharon Li: Out-of-Distribution Detection in the Era of Foundation Models

When deploying machine learning models in the open and non-stationary world, their reliability is often challenged by the presence of out-of-distribution (OOD) samples. Since data shifts happen prevalently in the real world, identifying OOD inputs has become an important problem in machine learning. In this talk, I will discuss challenges, research progress, and opportunities in OOD detection. Our work is motivated by the insufficiency of existing learning objectives such as ERM --- which focuses on minimizing error only on the in-distribution (ID) data, but does not explicitly account for the uncertainty that arises outside ID data. To mitigate the fundamental limitation, I will introduce a new algorithmic framework, which jointly optimizes for both accurate classification of ID samples, and reliable detection of OOD data. The learning framework integrates distributional uncertainty as a first-class construct in the learning process, thus enabling both accuracy and safety guarantees.


Bio: Sharon Yixuan Li is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. She received a Ph.D. from Cornell University in 2017, advised by John E. Hopcroft. Subsequently, she was a postdoctoral scholar in the Computer Science department at Stanford University. Her research focuses on the algorithmic and theoretical foundations of learning in open worlds. She has served as Area Chair for ICLR, NeurIPS, ICML, and Program Chair for Workshop on Uncertainty and Robustness in Deep Learning. She is the recipient of the AFOSR Young Investigator Program (YIP) award, the NSF CAREER award, MIT Technology Review TR-35 Award, Forbes30Under30 in Science, and multiple faculty research awards from Google, Meta, and Amazon. Her works received a NeurIPS Outstanding Paper Award, and an ICLR Outstanding Paper Award Honorable Mention in 2022.

Subbarao Kambhampati: Can LLMs Reason and Plan? 

Large Language Models (LLMs) are on track to reverse what seemed like an inexorable shift of AI from explicit to tacit knowledge tasks. Trained as they are on everything ever written on the web, LLMs exhibit "approximate omniscience"--they can provide answers to all sorts of queries, but with nary a guarantee. This could herald a new era for knowledge-based AI systems--with LLMs taking the role of (blowhard?) experts. But first, we have to stop confusing the impressive form of the generated knowledge for correct content, and resist the temptation to ascribe reasoning, planning, self-critiquing etc. powers to approximate retrieval by these n-gram models on steroids. We have to focus instead on LLM-Modulo techniques that complement the unfettered idea generation of LLMs with careful vetting by model-based AI systems. In this talk, I will reify this vision and attendant caveats in the context of the role of LLMs in planning tasks.

Bio: Subbarao Kambhampati is a professor of computer science at Arizona State University. Kambhampati studies fundamental problems in planning and decision making, motivated in particular by the challenges of human-aware AI systems. He is a fellow of Association for the Advancement of Artificial Intelligence, American Association for the Advancement of Science,  and Association for Computing machinery. He served as the president of the Association for the Advancement of Artificial Intelligence, a trustee of the International Joint Conference on Artificial Intelligence,  the chair of AAAS Section T (Information, Communication and Computation), and a founding board member of Partnership on AI. Kambhampati’s research as well as his views on the progress and societal impacts of AI have been featured in multiple national and international media outlets. He can be followed on Twitter @rao2z.

Tim G. J. Rudner: Understanding Uncertainty Quantification in Large Language Models

As large language models (LLMs) are increasingly deployed across a wide range of application domains, assuring that they operate safely and reliably—especially in open-ended domains—is crucial to prevent potential harm. Well-calibrated uncertainty estimates that accompany the text generated by an LLM can indicate the likelihood of an incorrect response, and as such, can serve as an effective fail-safe mechanism against hallucinations. Unfortunately, despite a growing body of research into uncertainty quantification in LLMs, existing methods largely fail to provide reliable uncertainty estimates in practice, and the lack of comparability across methods makes measuring progress difficult. In this talk, I will discuss different notions of uncertainty quantification in LLMs, highlighting their benefits and pitfalls, and explore how LLMs represent their uncertainty.

Bio: Tim G. J. Rudner is a Data Science Assistant Professor and Faculty Fellow at New York University’s Center for Data Science and an AI Fellow at Georgetown University's Center for Security and Emerging Technology. He conducted PhD research on probabilistic machine learning at the University of Oxford, where he was advised by Yee Whye Teh and Yarin Gal. The goal of his research is to create trustworthy machine learning models by developing methods and theoretical insights that improve the reliability, safety, transparency, and fairness of machine learning systems deployed in safety-critical settings. Tim holds a master’s degree in statistics from the University of Oxford and an undergraduate degree in applied mathematics and economics from Yale University. He is also a Qualcomm Innovation Fellow and a Rhodes Scholar.

Yarin Gal: Foundation Models That Can Tell Us When They Don’t Know

Foundation models, including models such as ChatGPT and Gemini have demonstrated spectacular abilities, taking a big leap forward compared to traditional deep learning approaches. Researchers and engineers are rapidly working to build these FMs into our daily lives in applications ranging from medical diagnosis to autonomous driving. Creating tools to detect when such models are ‘guessing at random’ and producing unsubstantiated outputs is key to their safe and successful deployment. This talk will explore challenges in quantifying uncertainty in foundation models, with applications ranging from language to astronomy, and will cover our recent Nature publication, ‘Detecting hallucinations in large language models using semantic entropy’.


Bio: Yarin Gal leads the Oxford Applied and Theoretical Machine Learning Group (OATML). He is an Associate Professor of Machine Learning at the Computer Science department, University of Oxford and a tutorial Fellow in Computer Science at Christ Church. He is also a Turing AI Fellow at the Alan Turing Institute, and Director of Research at the UK Government’s AI Safety Institute (AISI).