Workshop on Useful and Reliable AI Agents

August 29, 2024

11am-3pm ET

Watch the workshop video

AI agents have become an active area of research. But to be useful in the real world and at scale, agents need to be accurate, reliable, and cheap. Learn how to do that in this workshop.

Hosted by Princeton Language and Intelligence, the event will feature conversations with experts who have:

Built infrastructure for developing AI agents (DSPy, LangChain)
Led startups that build agents (Sierra, Sybill)
Created tools and benchmarks to evaluate LLMs and agents (SWE-bench, SPADE, lm-eval-harness)
Developed solutions to ensure reliability and safety (Constitutional AI, Inspect)

Watch the YouTube video of the workshop here.

Speakers

Mehak Aggarwal

Co-founder and Head of AI, Sybill.ai

Harrison Chase

CEO and Co-founder, LangChain

Iason Gabriel

Research Scientist, Google DeepMind

Omar Khattab

Ph.D. Candidate, Stanford University

Jelena Luketina

UK AI Safety Institute

Azalia Mirhoseini

Assistant Professor, Stanford University

Karthik R. Narasimhan

Associate Professor, Princeton University; Head of research, Sierra

Hailey Schoelkopf

Research Scientist, Eleuther AI

Shreya Shankar

PhD student, UC Berkeley

Schedule

Opening remarks

11 - 11:30 AM ET

Why did we organize this workshop?

In our recent paper, AI Agents That Matter, we propose changes to agent evaluation that would make agents useful for real-world tasks instead of just performing well on benchmarks.

Hear from the organizers about why they got this group together.

Session 1: Developer tools for AI agents

11:30am-12:30pm ET

What tools do developers need to develop agents?

The creators of LangChain, DSPy, and SPADE share how infrastructure for AI agents enables new applications and what goes into building robust developer tools.

Session 2: Evaluating agents for real-world use

12:45-1:45pm ET

Agent evaluation is a minefield.

Hear from the developers of SWE-Bench, Inspect, and lm-eval-harness, and many other state-of-the-art benchmarks share tips and tricks for reliable evaluation.

Session 3: Ensuring reliability

2-3pm ET

How do real-world applications ensure reliability with stochastic LLMs?

Join the builders of Constitutional AI, Sybill, and the lead author of The ethics of advanced AI assistants as they talk about the impact of reliability in AI agent applications.

The workshop will be held online on August 29th, 2024 from 11am-3pm ET. RSVP here.

Each session consists of: