Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

Hanjiang Hu, Alex Robey, Changliu Liu

Carnegie Mellon University

Published in TMLR: paper, code

Warning: This website contains examples of harmful LLM responses.