Boyi Li

Email: boyilics [at] gmail [dot] com

Brief Bio

I am a Research Scientist at NVIDIA Research and a Postdoctoral Scholar at UC Berkeley, advised by Prof. Jitendra Malik and Prof. Trevor Darrell.

I received my Ph.D. at Cornell University, advised by Prof. Serge Belongie and Prof. Kilian Q. Weinberger.

My research focuses on advancing embodied intelligence (agents) through multimodal data, developing generalizable algorithms, and creating interactive intelligent systems. Central to this work is reasoning, large language models, generative models, and robotics. A key aspect involves aligning representations from diverse multimodal data, including 2D pixels, 3D geometry, language, and audio.

News

📖 Presenting 🐺Wolf (Oral) at the "Workshop on Video-Language Models" at NeurIPS 2024 (Dec 14). See you in Vancouver 🇨🇦!

📖 "Large Multimodal Foundation Models" tutorial at ECCV 2024 (Sep 29). See you in Milan 🇮🇹!

📖 "Emergent Visual Abilities and Limits of Foundation Models" workshop at ECCV 2024 (Sep 30)

📖 "Vision-Centric Autonomous Driving" workshop at ECCV 2024 (Sep 30)

Selected Publications

Baifeng Shi, Boyi Li, Han Cai, Yao Lu, Sifei Liu, Marco Pavone, Jan Kautz, Song Han, Trevor Darrell, Pavlo Molchanov, Hongxu Yin

Scaling Vision Pre-Training to 4K Resolution

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Highlight, 2025

Paper · Project Webpage

Boyi Li*, Philipp Wu*, Pieter Abbeel, Jitendra Malik

Interactive Task Planning with Language Models

Transactions on Machine Learning Research (TMLR), 2025

Paper · Project Webpage · Code · Video

Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Jim Fan, Yuke Zhu, Jan Kautz, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone

Wolf: Dense Video Captioning with a World Summarization Framework

Workshop on Video-Language Models at NeurIPS, Oral Presentation and Highest Scores, 2024

Paper · Project Webpage · Featured in NVIDIA GTC 2025 (To evaluate Safety and Comfort)

Boyi Li*, Leo Chen*, Jathushan Rajasegaran*, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

Synthezing Moving People with 3D Control

Arxiv, 2024

Paper · Project Webpage · Code

Wei Chow*, Jiageng Mao*, Boyi Li, Daniel Seita, Vitor Guizilini, Yue Wang

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

International Conference on Learning Representations (ICLR), Oral, 2025

Paper · Project Webpage · Leaderboard

Ziqi Lu, Heng Yang, Danfei Xu, Boyi Li, Boris Ivanovic, Marco Pavone, Yue Wang

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models

International Conference on Learning Representations (ICLR), Spotlight, 2025

Paper

Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You, Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang

DreamDrive: Generative 4D Scene Modeling from Street View Images

IEEE international conference on robotics and automation (ICRA), 2025

Paper · Media Coverage

Ran Tian, Boyi Li, Xinshuo Weng, Yuxiao Chen, Edward Schmerling, Yue Wang, Boris Ivanovic, Marco Pavone

Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving

Conference on Robot Learning (CoRL), 2024

Paper · Project Webpage · Code

Shuhan Tan, Boris Ivanovic, Yuxiao Chen, Boyi Li, Xinshuo Weng, Yulong Cao, Philipp Kraehenbuehl, Marco Pavone

Promptable Closed-loop Traffic Simulation

Conference on Robot Learning (CoRL), 2024

Paper

Boyi Li, Yue Wang, Jiageng Mao, Boris Ivanovic, Sushant Veer, Karen Leung, Marco Pavone

LLaDA: Driving Everywhere with Large Language Model Policy Adaptation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper · Project Webpage · Video · Featured in NVIDIA GTC 2024 · NVIDIA Official Video · Bilibili

Tsung-Han Wu*, Long Lian*, Joseph E. Gonzalez, Boyi Li†, Trevor Darrell†

Self-correcting LLM-controlled Diffusion Models

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper · Project Webpage · Video · Code

Long Lian*, Baifeng Shi*, Adam Yala†, Trevor Darrell†, Boyi Li†

LLM-grounded Video Diffusion Models

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code

Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar

CMD: Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code

Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li,

Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code · NVIDIA Official Video

Long Lian, Boyi Li, Adam Yala, Trevor Darrell

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

Transactions on Machine Learning Research (TMLR), Featured Certification, 2024

Workshop on Knowledge and Logical Reasoning in the Era of Data-driven Learning at ICML, 2023

Paper · Project Webpage · Code · BAIR Blog · Hugging Face Demo

Boyi Li*, Rodolfo Corona*, Karttikeya Mangalam*, Catherine Chen*, Daniel Flaherty,

Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL Findings), 2024

Paper

Jiaxin Ge, Sanjay Subramanian, Trevor Darrell†, Boyi Li†

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Paper

Boyi Li, Yin Cui, Tsung-Yi Lin, Serge Belongie

SITTA: Single Image Texture Translation for Data Augmentation

European Conference on Computer Vision (ECCV) Workshops, 2022

Paper · Code

Boyi Li, Kilian Q. Weinberger, Serge Belongie, Vladlen Koltun, René Ranftl

Language-driven Semantic Segmentation

International Conference on Learning Representations (ICLR), 2022

Paper · Project Webpage · Code · Demo

🏆 ICLR 2022 Most Influential Papers

Boyi Li, Serge Belongie, Ser-nam Lim, Abe Davis

Neural Image Recolorization for Creative Domains

5th Workshop on Computer Vision for Fashion, Art, and Design at CVPR, Oral, 2022

Paper · Project Webpage

Boyi Li*, Felix Wu*, Ser-nam Lim, Serge Belongie, Kilian Q. Weinberger

On Feature Normalization and Data Augmentation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Paper · Project Webpage · Code · Video

Boyi Li*, Felix Wu*, Kilian Q. Weinberger, Serge Belongie

Positional Normalization

Neural Information Processing Systems (NeurIPS), Spotlight, 2019

Paper · Project Webpage · Code · Video

Professional Services

Area Chair: CVPR, ICCV, COLM, WACV

Conference / Journal Reviewer:

[Machine Learning] NeurIPS, ICML, ICLR, AAAI

[Computer Vision] CVPR, ICCV, ECCV, SIGGRAPH Asia, TIP, TMM, IJCV, TCSVT, CVIU, ICIP, VCIP

[Robotics] RSS, IROS, RA-L

Miscellaneous

Classical music (violin/piano), painting, interior design, singing, and raising cute animals.