AI agents have become an active area of research. But to be useful in the real world and at scale, agents need to be accurate, reliable, and cheap. Learn how to do that in this workshop.
Hosted by Princeton Language and Intelligence, the event will feature conversations with experts who have:
Built infrastructure for developing AI agents (DSPy, LangChain)
Led startups that build agents (Sierra, Sybill)
Created tools and benchmarks to evaluate LLMs and agents (SWE-bench, SPADE, lm-eval-harness)
Developed solutions to ensure reliability and safety (Constitutional AI, Inspect)
Watch the YouTube video of the workshop here.
Speakers
Mehak Aggarwal
Co-founder and Head of AI, Sybill.ai
Harrison Chase
CEO and Co-founder, LangChain
Iason Gabriel
Research Scientist, Google DeepMind
Omar Khattab
Ph.D. Candidate, Stanford University
Jelena Luketina
UK AI Safety Institute
Azalia Mirhoseini
Assistant Professor, Stanford University
Karthik R. Narasimhan
Associate Professor, Princeton University; Head of research, Sierra
Hailey Schoelkopf
Research Scientist, Eleuther AI
Shreya Shankar
PhD student, UC Berkeley
Schedule
Opening remarks
11 - 11:30 AM ET
Why did we organize this workshop?
In our recent paper, AI Agents That Matter, we propose changes to agent evaluation that would make agents useful for real-world tasks instead of just performing well on benchmarks.
Hear from the organizers about why they got this group together.
Session 1: Developer tools for AI agents
11:30am-12:30pm ET
What tools do developers need to develop agents?
The creators of LangChain, DSPy, and SPADE share how infrastructure for AI agents enables new applications and what goes into building robust developer tools.
Session 2: Evaluating agents for real-world use
12:45-1:45pm ET
Agent evaluation is a minefield.
Hear from the developers of SWE-Bench, Inspect, and lm-eval-harness, and many other state-of-the-art benchmarks share tips and tricks for reliable evaluation.
Session 3: Ensuring reliability
2-3pm ET
How do real-world applications ensure reliability with stochastic LLMs?
Join the builders of Constitutional AI, Sybill, and the lead author of The ethics of advanced AI assistants as they talk about the impact of reliability in AI agent applications.
The workshop will be held online on August 29th, 2024 from 11am-3pm ET. RSVP here.
Each session consists of:
Invited talks from experts in the field
A panel discussion
An audience Q&A session
Organizers
Princeton University
Benedikt Ströbl
Princeton University
Nitya Nadgir
Princeton University
Zachary S. Siegel
Princeton University
Princeton University
Questions? Contact sayashk@princeton.edu, stroebl@princeton.edu