Search this site
Embedded Files
Siyu Zhu - Fudan University

Siyu Zhu


Professor, Fudan University and Shanghai Innovation Institute

Director, Fudan Generative Vision Lab (Fusion Lab)

siyuzhu@fudan.edu.cn

Github      Google Scholar

Short  Biography

Siyu Zhu is a full professor with tenure at College of Computer Science and Artificial Intelligence at Fudan University, where he lead the generative vision lab (Fusion Lab). He is also a full-time mentor of Shanghai Innovation Institute. Prior to this, he served as a director at Alibaba Cloud AI Lab, leading a team of 100 employees, and contributed to various aspects of computer vision and machine learning-related products. He also co-founded the 3D vision company Altizure, which was later acquired by Apple Inc and contributed core technology to Apple Augmented Reality. 

He obtained his Ph.D. from HKUST under the supervision of Long Quan, and bachelor's degree from Zhejiang University. He has authored or co-authored over 70 papers in top journals and conferences such as TPAMI, ICLR, CVPR, ICCV, ECCV and NeurIPs.  His research interests include: (1) generative model for video and 3D; (2) diffusion-based multi-modality generative models; and (3) image based 3D reconstruction. 

He served as the area chair (senior program committee member) of ICCV 2025, CVPR 2026, ECCV 2026, NeurIPs 2026,  AAAI 2025/2026, WACV 2025, publicity chair of ICCV 2025, and the associate editor of IEEE Transactions on Visualization and Computer Graphics (IEEE TVCG). He was awarded the CCF Outstanding Engineer Award.

Selected  Works (Lab Github)

  • 2D and 3D Generative Model

Hallo Series (Hallo,  Hallo2,  Hallo3, Hallo-Live),  Champ,  Tora Series (Tora, Tora3),  AnimateAnything, VideoMV, Altizure (Apple Augmented Reality) 
  • World Model and LVM

WAM-Flow,  WAM-Diff, Bard-VL
  • Datasets of Generative Model

DynamicPDB,  OpenHumanVid

Recent Publications (Full List)

  • Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation
C Li, J Li, R Mei, H Xia, H Zhu, J Wang, S ZhuarXiv preprint:2604.23632 2026
  • BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation
B Chen, H Xia, P Tu, H Shi, S Mu, W Yuan, S ZhuarXiv preprint:2604.16514 2026
  • LaPha: Latent Poincaré Shaping for Agentic Reinforcement Learning
H Xia, B Chen, Z Zang, Y Ge, G Zhao, S ZhuarXiv preprint:2602.09375 2026
  • Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence
J Liao, Z Zhang, X Meng, L Li, Z Zhang, S Zhu, L Qin, W WangarXiv preprint:2604.09057 2026
  • The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents
Y Sun, Y Yao, H Li, W Yuan, S ZhuarXiv preprint:2604.25299 2026
  • WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving
M Xu, J Cui, F Cai, H Shang, Z Zhu, S Luan, Y Xu, N Zhang, Y Li, J Cai, S ZhuarXiv preprint:2512.11872 2026
  • Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers
Y Yao, Y Chen, H Li, K Cheng, Q Guo, Y Sun, Z Dong, J Wang, S ZhuInternational Conference on Machine Learning (ICML) 2026
  • Towards Native Generative Model for 3D Head Avatar
Y Zhuang, H Zhu, J Zhang, Y He, Y Wang, J Zhu, Y Yao, S Zhu, X CaoFundamental Research 2026
  • T^\star: Progressive Block Scaling for Masked Diffusion Language Models Through Trajectory Aware Reinforcement Learning
H Xia, B Chen, Y Ge, G Zhao, S ZhuAnnual Meeting of the Association for Computational Linguistics (ACL) 2026
  • WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving
Y Xu, J Cui, F Cai, Z Zhu, H Shang, S Luan, M Xu, N Zhang, Y Li, J Cai, S ZhuComputer Vision and Pattern Recognition (CVPR) 2026
  • MixFlow Training: Alleviating Exposure Bias with Slowed Interpolation Mixture
H Li, J Lyu, FY Wang, K Cheng, S Zhu, J WangComputer Vision and Pattern Recognition (CVPR) 2026
  • CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image
Y Song, Y Zhuang, Q Xu, H Wang, J Zhu, J Tian, S Zhu, H ZhuComputer Vision and Pattern Recognition (CVPR) 2026
  • Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
J Li, B Chen, H Li, Z Dong, J Wang, S ZhuComputer Vision and Pattern Recognition (CVPR) 2026
  • Large Depth Completion Model from Sparse Observations
Z Yu, Z Zhao, R Zhang, L Qiu, SY Cao, K Qiu, Y He, S Zhu, Z Dong, H ShenInternational Conference on Learning Representations (ICLR) 2026
  • LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing
Y Xu, J Cui, F Cai, Z Zhu, H Shang, S Luan, M Xu, N Zhang, Y Li, J Cai, S ZhuInternational Conference on Learning Representations (ICLR) 2026
  • Pyramidal Patchification Flow for Visual Generation
Y Xu, J Cui, F Cai, Z Zhu, H Shang, S Luan, M Xu, N Zhang, Y Li, J Cai, S ZhuInternational Conference on Learning Representations (ICLR) 2026
Google Sites
Report abuse
Google Sites
Report abuse