Large Language Models with Multi-modal and Semi-structured Knowledge
Professor Liang Zhao
Abstract: This talk presents a unified agenda for augmenting large language models with multi-modal and semi-structured knowledge. He will show how graph-grounded RAG enables principled retrieval and reasoning over text-attributed and citation graphs, supporting scholarly Q&A and automated taxonomy construction via subgraph/path retrieval and multi-hop evidence composition. He will introduce resources and representation learning methods for textual-edge and text-attributed graphs that enable reproducible evaluation at scale. The talk then extends beyond graphs to spatial reasoning, where geospatial indexes and planning-informed prompting allow LLMs to answer real-world, map-based questions. To increase transparency and trust, he will cover techniques that generate faithful natural-language rationales for graph models and methods for explaining latent generative factors with multimodal LLMs. Finally, he will discuss cross-modal augmentation—using sub-dimensional retrieval to steer text-to-image generation—and knowledge-graph inference with zero-shot link prediction, together with lessons from healthcare knowledge-graph practice.
Bio: Dr. Liang Zhao is a Winship Distinguished Research Professor and Associate Professor in the Department of Computer Science at Emory University. He obtained his Ph.D. degree as an Outstanding Doctoral Student in the Department of Computer Science at Virginia Tech in 2017. His research spans machine learning and artificial intelligence for complex data including networks, spatial, temporal, and textual data. He has authored over two hundred papers in top-tier venues such as KDD, NeurIPS, ICLR, AAAI, and IJCAI. His works have been recognized via numerous awards such as Test of Time Award in KDD'25, Best Paper Award in ICDM'22, Best Paper Award in ICDM'19, Best Poster Runner-Up in ACM SIGSPATIAL'22, Best Paper Candidate in WWW'21 and so on. His career achievement has also been recognized by federal agencies, academic organizations, and top tech companies via awards like NSF CAREER Award, Middle-CAREER Award from IEEE Computer Society on Smart Computing, Amazon Research Award, Meta (former Facebook) Research Award, Cisco Faculty Research Award. He is Computing Innovative Fellow Mentor.
Towards Generally Intelligent Multimodal Systems for Enterprises
Professor Sai Rajeswar
Abstract: While scaling laws and large language models (LLMs) have unlocked impressive automation, enabling higher-order decision-making from pixels alone remains a challenge. Such systems must excel at complex multimodal perception: interpreting dashboards, analyzing infographics, reasoning over documents, navigating enterprise UIs, and synthesizing heterogeneous inputs into actions. Building this capability requires advances in multimodal representation learning, grounding, reasoning, and interactive control. In this talk, I present our coordinated effort spanning permissive dataset creation, benchmark design, architectural innovation, and reinforcement learning for adaptive deliberation. Together, these and future contributions would lay the foundation blocks for multimodal enterprise systems capable of automating internal operational tasks.
Bio: Sai Rajeswar is a Staff Research Scientist at ServiceNow, an Adjunct professor and a core industry member at Mila Montréal. His work over the last eight years spans reinforcement learning and multimodal AI. Lately, has been focusing on building multimodal systems that serve as the foundation for generalist AI agents, systems that integrate perception and action while incorporating feedback from the environment. Broadly, his work aims to integrate perception and action to improve real-world applicability, always with an eye towards responsible impact on society at large.