Boyi Li 

Email: boyilics [at] gmail [dot] com

Google Scholar / Github / Twitter   

Brief Bio

I am a Research Scientist at NVIDIA Autonomous Vehicle Research Group and a Postdoctoral Scholar at UC Berkeley, advised by Prof. Jitendra Malik and Prof. Trevor Darrell

I received my Ph.D. at Cornell University, advised by Prof. Serge Belongie and Prof. Kilian Q. Weinberger.

My research focuses on advancing embodied intelligence through multimodal data, developing generalizable algorithms, and creating interactive intelligent systems. Central to this work is reasoning, large language models, generative models, and robotics. A key aspect involves aligning representations from diverse multimodal data, including 2D pixels, 3D geometry, language, and audio.

News

📖 "Large Multimodal Foundation Models" tutorial (Sep 29) at ECCV 2024

📖 "Emergent Visual Abilities and Limits of Foundation Models" (Sep 30) workshop at ECCV 2024

📖 "Vision-Centric Autonomous Driving" workshop (Sep 30) at ECCV 2024

Selected Publications

Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

Synthezing Moving People with 3D Control 

Arxiv, 2024

Paper · Project Webpage · Code

Boyi Li, Yue Wang, Jiageng Mao, Boris Ivanovic, Sushant Veer, Karen Leung, Marco Pavone

LLaDA: Driving Everywhere with Large Language Model Policy Adaptation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper · Project Webpage · Video · Featured in NVIDIA GTC · NVIDIA Official Video · Bilibili 

Tsung-Han Wu*, Long Lian*, Joseph E. Gonzalez, Boyi Li, Trevor Darrell

Self-correcting LLM-controlled Diffusion Models 

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper · Project Webpage · Video · Code

Long Lian*, Baifeng Shi*, Adam Yala, Trevor Darrell, Boyi Li

LLM-grounded Video Diffusion Models 

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code

Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar

CMD: Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code

Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li

Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

International Conference on Learning Representations (ICLR), 2024

Paper · Project Webpage · Code · NVIDIA Official Video  

Long Lian, Boyi Li, Adam Yala, Trevor Darrell

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models 

Transactions on Machine Learning Research (TMLR),  Featured Certification, 2024

Workshop on Knowledge and Logical Reasoning in the Era of Data-driven Learning at ICML, 2023

Paper · Project Webpage · Code · BAIR Blog · Hugging Face Demo

Boyi Li*, Rodolfo Corona*, Karttikeya Mangalam*, Catherine Chen*, Daniel Flaherty, 

Serge Belongie, Kilian Q.  Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL Findings), 2024

Paper

Boyi Li*, Philipp Wu*, Pieter Abbeel, Jitendra Malik

Interactive Task Planning with Language Models 

Workshop on Language and Robot Learning Language as Grounding at CoRL, 2023

Paper · Project Webpage · Code · Video

Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Paper 

Boyi Li, Yin Cui, Tsung-Yi Lin, Serge Belongie

SITTA: Single Image Texture Translation for Data Augmentation

European Conference on Computer Vision (ECCV) Workshops, 2022

Paper · Code

Boyi Li, Kilian Q. Weinberger, Serge Belongie,  Vladlen Koltun, René Ranftl

Language-driven Semantic Segmentation 

International Conference on Learning Representations (ICLR), 2022

Paper · Project Webpage · Code · Demo

🏆 Ranked 15th in ICLR 2022 Most Influential Papers

Boyi Li, Serge Belongie, Ser-nam Lim, Abe Davis

Neural Image Recolorization for Creative Domains

5th Workshop on Computer Vision for Fashion, Art, and Design at CVPR, Oral, 2022

Paper · Project Webpage

Boyi Li*, Felix Wu*, Ser-nam Lim, Serge Belongie,  Kilian Q. Weinberger

On Feature Normalization and Data Augmentation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Paper · Project Webpage · Code · Video

Boyi Li*, Felix Wu*, Kilian Q. Weinberger, Serge Belongie

Positional Normalization

Neural Information Processing Systems (NeurIPS), Spotlight, 2019

Paper · Project Webpage · Code  ·  Video

Miscellaneous

Classical music (violin/piano), painting, interior design, singing, and raising cute animals.