Conversational AI: Enterprise AI agents increasingly require sophisticated integration between large language models and external tools like document retrievers and API endpoints. My research develops methods to enhance LLM performance on agentic Retrieval-Augmented Generation (RAG) and API tasks involving multi-hop reasoning across different tool types. This includes a continued emphasis on maintaining information grounding while enabling complex reasoning chains in both single and multi-turn knowledge-grounded conversations.
LLM Instruction Following: Large language models exhibit surprising deficiencies in following straightforward instructions, undermining their reliability in practical applications. My research addresses this through both training-time methodologies that improve models' inherent instruction-following capabilities and inference-time interventions for existing models.
Responsible AI Licensing: Outside of my primary employment and in a personal capacity, I co-developed the concept of Behavioral-use Licensing and have actively worked with the AI community to drive its adoption. These licenses now represent the second-largest class of licensing frameworks for AI models after Open Source Licenses, and have been adopted by model providers including DeepSeek, Meta, and Google, among others. I chaired the IEEE-SA P2840 Standard Working Group for Responsible AI Licensing and the BigScience Model Governance Working Group, which led to the first release of a large LLM under a Behavioral-use license. I also worked with AAAI to enable conference authors to publish code and models associated with papers under RAIL licenses.
mtRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
End-to-End human-generated multi-turn RAG benchmark (Transactions of ACL; Presented at ACL 2025)