Here we are exploring how aligning LLM's graph reasoning capabilities using human feedback can enhance the automation of scientific discovery. I am co-mentoring Rosni Vasu on this project along with Bhavana Dalvi.
Drawing from recent work on retrieval augmented generation and personalization of LLM we propose a reading assistant for serving simplified versions of medical texts tailored to the reader's medical literacy.
(paper under review)
Any model performing simplification must have three capabilities: 1. predicting the span of the expert text that must be altered, 2. predicting the alteration or operation on each span, and 3. predicting the additional (in case of elaboration) or alternative (for replacement) contents. Our T5-based models with multi-angle training allow users to selectively edit contents of the complex medical text by elaboration, replacement, deletion, or insertion while generating SOTA simplification.
Human preferences change in response to the emergent behaviors of the other agents in the environment. In this work, we propose learning reward dynamics that can adapt in non-stationary environments with several interacting agents.
Drawing from preference learning, hierarchical reward learning, rich queries, and Bayesian models of human guidance we are designing interactive AI agents that can learn faster by asking the right questions. They use rich models of human guidance to ask queries autonomously.