Conversational AI: Enterprise AI agents increasingly require sophisticated integration between large language models and external tools like document retrievers and API endpoints. My research develops methods to enhance LLM performance on agentic Retrieval-Augmented Generation (RAG) and API tasks involving multi-hop reasoning across different tool types. This includes a continued emphasis on maintaining information grounding while enabling complex reasoning chains in both single and multi-turn knowledge-grounded conversations.
LLM Instruction Following: Large language models exhibit surprising deficiencies in following straightforward instructions, undermining their reliability in practical applications. My research addresses this through both training-time methodologies that improve models' inherent instruction-following capabilities and inference-time interventions for existing models.
Responsible AI Licensing: I co-developed the concept of Behavioral-use Licensing and have been actively working with the AI community to drive its adoption. These types of licenses now represent the second-largest class of licensing frameworks for AI models after Open Source Licenses, and have been adopted by Model providers including DeepSeek, Meta and Google, amongst others. I chaired the IEEE-SA P2840 Standard working group for Responsible AI Licensing and the BigScience Model Governance Working Group, which led to the first release of a large LLM with Behavioral-use licensing. I also worked with AAAI to enable the option for conference authors to publish code/models associated with papers under RAIL Licenses.
mtRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
End-to-End human-generated multi-turn RAG benchmark (Transactions of ACL; Presented at ACL 2025)