Conversational AI: Enterprise AI agents increasingly require sophisticated integration between large language models and external tools like document retrievers and API endpoints. My research develops methods to enhance LLM performance on Retrieval-Augmented Generation (RAG) and API tasks involving multi-hop reasoning across different tool types. This includes a continued emphasis on maintaining information grounding while enabling complex reasoning chains in both single and multi-turn knowledge-grounded conversations.
LLM Instruction Following: Large language models exhibit surprising deficiencies in following straightforward instructions, undermining their reliability in practical applications. My research addresses this through both training-time methodologies that improve models' inherent instruction-following capabilities and inference-time interventions for existing models.
Responsible AI Licensing: I co-developed the concept of Behavioral-use Licensing and have been actively working with the AI community to drive its adoption. These types of licenses now represent the second-largest class of licensing frameworks for AI models after Open Source Licenses, and have been adopted by Model providers including DeepSeek, Meta and Google, amongst others. I chaired the IEEE-SA P2840 Standard working group for Responsible AI Licensing and the BigScience Model Governance Working Group, which led to the first release of a large LLM with Behavioral-use licensing. I also worked with AAAI to enable the option for conference authors to publish code/models associated with papers under RAIL Licenses.
mtRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
End-to-End human-generated multi-turn RAG benchmark (Transactions of ACL; Presented at ACL 2025)