Boyi Li 

Email: boyilics [at] gmail [dot] com

Google Scholar / Github / Twitter   

Brief Bio

I am a Research Scientist at NVIDIA Autonomous Vehicle Research Group and a Postdoctoral Scholar at UC Berkeley, advised by Prof. Jitendra Malik and Prof. Trevor Darrell

I received my Ph.D. at Cornell University, advised by Prof. Serge Belongie and Prof. Kilian Q. Weinberger.

My research focuses on advancing embodied intelligence (agents) through multimodal data, developing generalizable algorithms, and creating interactive intelligent systems. Central to this work is reasoning, large language models, generative models, and robotics. A key aspect involves aligning representations from diverse multimodal data, including 2D pixels, 3D geometry, language, and audio.

News

📖 Presenting 🐺Wolf  (Oral) at the "Workshop on Video-Language Models" at NeurIPS 2024 (Dec 14). See you in Vancouver 🇨🇦!

📖 "Large Multimodal Foundation Models" tutorial at ECCV 2024 (Sep 29). See you in Milan 🇮🇹!

📖 "Emergent Visual Abilities and Limits of Foundation Models" workshop at ECCV 2024 (Sep 30) 

📖 "Vision-Centric Autonomous Driving" workshop at ECCV 2024 (Sep 30) 

Selected Publications

Boyi Li*, Philipp Wu*, Pieter Abbeel, Jitendra Malik

Interactive Task Planning with Language Models 

Transactions on Machine Learning Research (TMLR), 2025

Workshop on Language and Robot Learning Language as Grounding at CoRL, 2023

Paper · Project Webpage · Code · Video

Boyi Li*, Leo Chen*, Jathushan Rajasegaran*, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

Synthezing Moving People with 3D Control 

Arxiv, 2024

Paper · Project Webpage · Code

Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Jim Fan, Yuke Zhu, Jan Kautz, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone

Wolf: Captioning Everything with a World Summarization Framework

Workshop on Video-Language Models at NeurIPS, Oral Presentation and Highest Scores, 2024

Paper · Project Webpage

Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You, Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang

DreamDrive: Generative 4D Scene Modeling from Street View Images

Arxiv, 2024

Paper · Media Coverage

Ran Tian, Boyi Li, Xinshuo Weng, Yuxiao Chen, Edward Schmerling, Yue Wang, Boris Ivanovic, Marco Pavone

Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving

Conference on Robot Learning (CoRL), 2024

Paper 

Shuhan Tan, Boris Ivanovic, Yuxiao Chen, Boyi Li, Xinshuo Weng, Yulong Cao, Philipp Kraehenbuehl, Marco Pavone

Promptable Closed-loop Traffic Simulation

Conference on Robot Learning (CoRL), 2024

Paper 

Boyi Li, Yue Wang, Jiageng Mao, Boris Ivanovic, Sushant Veer, Karen Leung, Marco Pavone

LLaDA: Driving Everywhere with Large Language Model Policy Adaptation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper · Project Webpage · Video · Featured in NVIDIA GTC · NVIDIA Official Video · Bilibili 

Tsung-Han Wu*, Long Lian*, Joseph E. Gonzalez, Boyi Li, Trevor Darrell

Self-correcting LLM-controlled Diffusion Models 

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper · Project Webpage · Video · Code

Long Lian*, Baifeng Shi*, Adam Yala, Trevor Darrell, Boyi Li

LLM-grounded Video Diffusion Models 

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code

Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar

CMD: Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code

Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li

Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code · NVIDIA Official Video  

Long Lian, Boyi Li, Adam Yala, Trevor Darrell

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models 

Transactions on Machine Learning Research (TMLR),  Featured Certification, 2024

Workshop on Knowledge and Logical Reasoning in the Era of Data-driven Learning at ICML, 2023

Paper · Project Webpage · Code · BAIR Blog · Hugging Face Demo

Boyi Li*, Rodolfo Corona*, Karttikeya Mangalam*, Catherine Chen*, Daniel Flaherty, 

Serge Belongie, Kilian Q.  Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL Findings), 2024

Paper

Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Paper 

Boyi Li, Yin Cui, Tsung-Yi Lin, Serge Belongie

SITTA: Single Image Texture Translation for Data Augmentation

European Conference on Computer Vision (ECCV) Workshops, 2022

Paper · Code

Boyi Li, Kilian Q. Weinberger, Serge Belongie,  Vladlen Koltun, René Ranftl

Language-driven Semantic Segmentation 

International Conference on Learning Representations (ICLR), 2022

Paper · Project Webpage · Code · Demo

🏆 Ranked 15th in ICLR 2022 Most Influential Papers

Boyi Li, Serge Belongie, Ser-nam Lim, Abe Davis

Neural Image Recolorization for Creative Domains

5th Workshop on Computer Vision for Fashion, Art, and Design at CVPR, Oral, 2022

Paper · Project Webpage

Boyi Li*, Felix Wu*, Ser-nam Lim, Serge Belongie,  Kilian Q. Weinberger

On Feature Normalization and Data Augmentation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Paper · Project Webpage · Code · Video

Boyi Li*, Felix Wu*, Kilian Q. Weinberger, Serge Belongie

Positional Normalization

Neural Information Processing Systems (NeurIPS), Spotlight, 2019

Paper · Project Webpage · Code  ·  Video

Miscellaneous

Classical music (violin/piano), painting, interior design, singing, and raising cute animals.