Brief Bio
I am a Research Scientist at NVIDIA Autonomous Vehicle Research Group and a Postdoctoral Scholar at UC Berkeley, advised by Prof. Jitendra Malik and Prof. Trevor Darrell.
I received my Ph.D. at Cornell University, advised by Prof. Serge Belongie and Prof. Kilian Q. Weinberger.
My research focuses on advancing embodied intelligence through multimodal data, developing generalizable algorithms, and creating interactive intelligent systems. Central to this work is reasoning, large language models, generative models, and robotics. A key aspect involves aligning representations from diverse multimodal data, including 2D pixels, 3D geometry, language, and audio.
News
📖 "Large Multimodal Foundation Models" tutorial (Sep 29) at ECCV 2024
📖 "Emergent Visual Abilities and Limits of Foundation Models" (Sep 30) workshop at ECCV 2024
📖 "Vision-Centric Autonomous Driving" workshop (Sep 30) at ECCV 2024
Selected Publications
Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik
Synthezing Moving People with 3D Control
Arxiv, 2024
Paper · Project Webpage · Code
Boyi Li, Yue Wang, Jiageng Mao, Boris Ivanovic, Sushant Veer, Karen Leung, Marco Pavone
LLaDA: Driving Everywhere with Large Language Model Policy Adaptation
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Paper · Project Webpage · Video · Featured in NVIDIA GTC · NVIDIA Official Video · Bilibili
Tsung-Han Wu*, Long Lian*, Joseph E. Gonzalez, Boyi Li†, Trevor Darrell†
Self-correcting LLM-controlled Diffusion Models
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Paper · Project Webpage · Video · Code
Long Lian*, Baifeng Shi*, Adam Yala†, Trevor Darrell†, Boyi Li†
LLM-grounded Video Diffusion Models
International Conference on Learning Representations (ICLR), 2024
Paper · Project Webpage · Code
Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar
CMD: Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
International Conference on Learning Representations (ICLR), 2024
Paper · Project Webpage · Code
Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li,
Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
International Conference on Learning Representations (ICLR), 2024
Long Lian, Boyi Li, Adam Yala, Trevor Darrell
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
Transactions on Machine Learning Research (TMLR), Featured Certification, 2024
Workshop on Knowledge and Logical Reasoning in the Era of Data-driven Learning at ICML, 2023
Paper · Project Webpage · Code · BAIR Blog · Hugging Face Demo
Boyi Li*, Rodolfo Corona*, Karttikeya Mangalam*, Catherine Chen*, Daniel Flaherty,
Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein
Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL Findings), 2024
Boyi Li*, Philipp Wu*, Pieter Abbeel, Jitendra Malik
Interactive Task Planning with Language Models
Workshop on Language and Robot Learning Language as Grounding at CoRL, 2023
Paper · Project Webpage · Code · Video
Jiaxin Ge, Sanjay Subramanian, Trevor Darrell†, Boyi Li†
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Boyi Li, Kilian Q. Weinberger, Serge Belongie, Vladlen Koltun, René Ranftl
Language-driven Semantic Segmentation
International Conference on Learning Representations (ICLR), 2022
Paper · Project Webpage · Code · Demo
Boyi Li, Serge Belongie, Ser-nam Lim, Abe Davis
Neural Image Recolorization for Creative Domains
5th Workshop on Computer Vision for Fashion, Art, and Design at CVPR, Oral, 2022
Boyi Li*, Felix Wu*, Ser-nam Lim, Serge Belongie, Kilian Q. Weinberger
On Feature Normalization and Data Augmentation
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Paper · Project Webpage · Code · Video
Boyi Li*, Felix Wu*, Kilian Q. Weinberger, Serge Belongie
Positional Normalization
Neural Information Processing Systems (NeurIPS), Spotlight, 2019
Paper · Project Webpage · Code · Video
Miscellaneous
Classical music (violin/piano), painting, interior design, singing, and raising cute animals.