Siyu Zhu

Professor, Fudan University and Shanghai Innovation Institute

Director, Fudan Generative Vision Lab (Fusion Lab)

siyuzhu@fudan.edu.cn

Short Biography

Siyu Zhu is a full professor with tenure at College of Computer Science and Artificial Intelligence at Fudan University, where he lead the generative vision lab (Fusion Lab). He is also a full-time mentor of Shanghai Innovation Institute. Prior to this, he served as a director at Alibaba Cloud AI Lab, leading a team of 100 employees, and contributed to various aspects of computer vision and machine learning-related products. He also co-founded the 3D vision company Altizure, which was later acquired by Apple Inc and contributed core technology to Apple Augmented Reality.

He obtained his Ph.D. from HKUST under the supervision of Long Quan, and bachelor's degree from Zhejiang University. His research interests include: (1) generative model for video and 3D; (2) diffusion-based multi-modality generative models; and (3) image based 3D reconstruction.

He served as the area chair (senior program committee member) of ICCV 2025, CVPR 2026, ECCV 2026, NeurIPs 2026, AAAI 2025/26/27, WACV 2025/27, publicity chair of ICCV 2025, and the associate editor of IEEE Transactions on Visualization and Computer Graphics (IEEE TVCG). He was awarded the CCF Outstanding Engineer Award.

Selected Works (Lab Github)

Image, Video and 3D Generative Models

Hallo Series (Hallo , Hallo2, Hallo3, Hallo-Live), Champ, Tora Series (Tora, Tora3), AnimateAnything, VideoMV, Altizure (Apple Augmented Reality)

World Models and Multimodal Models

WAM-Diff, WAM-Flow , Bard Series (Bard, Bard-VL)

Datasets for Generative Model

DynamicPDB, OpenHumanVid

Team Members

PhD Students

Class 2026: Wenhao Zhang, Yuxuan Yao, Weijia Dou, Chunyu Li, Zhenghao Sun, Cailin ZhuangClass 2025: Jiahao Cui, Baoyou Chen, Mingwang Xu, Yifang Xu, Quanhui TangClass 2024: Hui Li, Kaihui Cheng

Master Students

Class 2026: Guancong Lin, Junhao Zhang, Li ZengClass 2025: Zhihao Zhu, Jimin Chen, Ruiqiao Mei, Shan LuanClass 2024: Hanlin Shang, Jiaye Li, Yuxuan Chen

Researchers (Shanghai Innovation Institute & Shanghai Academy of AI for Science)

Tzuhsiung (Nick) Yang, Ziran Zhang, Yuwei Sun, Hongqing Han, Hanchen Xia, Peng Tu, Wenkai Xiang, Xiao Hu, Zhiqiang Cai, Liwei Zhang, Fulian Xiao, Haoyuan Xia

Recent Publications (Full List)

SlotMemory: Object-Centric KV Memory for Streaming Long-Video Generation

W Dou, H Li, J Cui, L Zhou, J Wang, S ZhuarXiv preprint arXiv:2605.31033 2026

GroundShot: Visually Consistent Multi-Shot Long Video Generation via Entity-Grounded Shot Scheduling

Y Lai, T Shao, K Zhou, W Dou, S Zhu, J WangarXiv preprint arXiv:2606.20799 2026

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents

Y Sun, Y Yao, H Li, W Yuan, S ZhuarXiv preprint:2604.25299 2026

UniTacVLA: Unified Tactile Understanding and Prediction in Vision Language Action Models

X Zhang, Y Zhang, J Shi, F Zhu, S Zhu, M Y Wang, X Wu, W YuanarXiv preprint:2606.31723 2026

Glob3R: Global Structure-from-Motion with 3D Foundation Models

J Deng, H Li, K Qiu, L Qiu, R Peng, W Shen, W Yuan, S Zhu, Z Dong, P TanarXiv preprint:2607.09225 2026

Touch-R1: Reinforcing Touch Reasoning in MLLMs

Y Lai, Y Zhou, F Zhu, S Zhu, W YuanarXiv preprint arXiv:2605.27154 2026

3DThinkVLA: Endowing Vision-Language-Action Models with Latent 3D Priors via 3D-Thinking-Guided Co-training

J Shi, X Zhang, F Zhu, Z Li, S Zhu, W YuanarXiv preprint arXiv:2606.04436 2026

Towards Consistent Video Geometry Estimation

Z Yu, J Gao, R Zhang, L Qiu, Z Zhao, R Peng, Y Yan, K Qiu, S Zhu, S Cao, H ShenarXiv preprint:2605.30060 2026

Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation

C Li, J Li, R Mei, H Xia, H Zhu, J Wang, S ZhuACM International Conference on Multimedia (ACM MM) 2026

BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation

B Chen, H Xia, P Tu, H Shi, S Mu, W Yuan, S ZhuACM International Conference on Multimedia (ACM MM) 2026

Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence

J Liao, Z Zhang, X Meng, L Li, Z Zhang, S Zhu, L Qin, W WangACM International Conference on Multimedia (ACM MM) 2026

Forge4d: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-View Videos

Y Hu, Y He, J Chen, W Yuan, K Qiu, Z Lin, S Zhu, Z Dong, J ZhangEuropean Conference on Computer Vision (ECCV) 2026

Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers

Y Yao, Y Chen, H Li, K Cheng, Q Guo, Y Sun, Z Dong, J Wang, S ZhuInternational Conference on Machine Learning (ICML) 2026

Towards Native Generative Model for 3D Head Avatar

Y Zhuang, H Zhu, J Zhang, Y He, Y Wang, J Zhu, Y Yao, S Zhu, X CaoFundamental Research 2026

T^\star: Progressive Block Scaling for Masked Diffusion Language Models Through Trajectory Aware Reinforcement Learning

H Xia, B Chen, Y Ge, G Zhao, S ZhuAnnual Meeting of the Association for Computational Linguistics (ACL) 2026

Learning Implicit Bias in Generative Spaces for Accelerating Protein Dynamics Emulation

K Cheng, Z Cai, W Xiang, Z Hu, S Zhu, T Yang, Y QiarXiv preprint arXiv:2606.01833

LaPha: Latent Poincaré Shaping for Agentic Reinforcement Learning

H Xia, B Chen, Z Zang, Y Ge, G Zhao, S ZhuarXiv preprint:2602.09375 2026

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

M Xu, J Cui, F Cai, H Shang, Z Zhu, S Luan, Y Xu, N Zhang, Y Li, J Cai, S ZhuarXiv preprint:2512.11872 2026

WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving

Y Xu, J Cui, F Cai, Z Zhu, H Shang, S Luan, M Xu, N Zhang, Y Li, J Cai, S ZhuComputer Vision and Pattern Recognition (CVPR) 2026

MixFlow Training: Alleviating Exposure Bias with Slowed Interpolation Mixture

H Li, J Lyu, FY Wang, K Cheng, S Zhu, J WangComputer Vision and Pattern Recognition (CVPR) 2026

CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image

Y Song, Y Zhuang, Q Xu, H Wang, J Zhu, J Tian, S Zhu, H ZhuComputer Vision and Pattern Recognition (CVPR) 2026

Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation

J Li, B Chen, H Li, Z Dong, J Wang, S ZhuComputer Vision and Pattern Recognition (CVPR) 2026

Large Depth Completion Model from Sparse Observations

Z Yu, Z Zhao, R Zhang, L Qiu, SY Cao, K Qiu, Y He, S Zhu, Z Dong, H ShenInternational Conference on Learning Representations (ICLR) 2026

LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing

Y Xu, J Cui, F Cai, Z Zhu, H Shang, S Luan, M Xu, N Zhang, Y Li, J Cai, S ZhuInternational Conference on Learning Representations (ICLR) 2026

Pyramidal Patchification Flow for Visual Generation

H Li, B Chen, L Zhang, J Li, J Wang, S ZhuInternational Conference on Learning Representations (ICLR) 2026

Google Sites

Report abuse