Search this site
Embedded Files
Siyu Zhu - Fudan University

Siyu Zhu


Professor, Fudan University and Shanghai Innovation Institute

Director, Fudan Generative Vision Lab (Fusion Lab)

siyuzhu@fudan.edu.cn

Github      Google Scholar

Short  Biography

Siyu Zhu is a full professor with tenure at the AI3 Institute of Fudan University and lead the generative vision lab (Fusion Lab). He is also a full-time mentor of Shanghai Innovation Institute. Prior to this, he served as a director at Alibaba Cloud AI Lab, leading a team of 100 employees, and contributed to various aspects of computer vision and machine learning-related products. He also co-founded the 3D vision company Altizure, which was later acquired by Apple Inc and contributed core technology to Apple Augmented Reality. 

He obtained his Ph.D. from HKUST under the supervision of Long Quan, and bachelor's degree from Zhejiang University. He has authored or co-authored over 70 papers in top journals and conferences such as TPAMI, ICLR, CVPR, ICCV, and ECCV.  His research interests include: (1) generative model for video and 3D, specifically 3D and physics guided video generation, and (2) image based 3D reconstruction. 

He served as the area chair (or SPC member) of ICCV 2025, CVPR 2026,  AAAI 2024/2025, WACV 2025, publicity chair of ICCV 2025, and the associate editor of IEEE TVCG. He was awarded the CCF Outstanding Engineer Award.

Selected  Works (Lab Github)

  • 2D Video Generative Model

Hallo,  Hallo2,  Hallo3,  Champ,  Tora,  AnimateAnything
  • 3D Generative/Reconstruction Model

Altizure (Apple Augmented Reality) ,  VideoMV,  Stag4D
  • Datasets of Generative Model

DynamicPDB,  OpenHumanVid

Recent Publications (Full List)

  • Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation
J Cui, Y Chen, M Xu, H Shang, Y Chen, Y Zhan, Z Dong, Y Yao, J Wang, S ZhuSIGGRAPH Asia 2025
  • DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration
Y Chen, H Shang, C Liu, Y Chen, H Li, W Yuan, H Zhu, Z Dong, S ZhuInternational Conference on Computer Vision (ICCV) 2025
  • Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks
J Cui, H Li, Y Zhan, H Shang, K Cheng, Y Ma, S Mu, H Zhou, J Wang, S ZhuComputer Vision and Pattern Recognition (CVPR) 2025
  • OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
H Li, M Xu, Y Zhan, S Mu, J Li, K Cheng, Y Chen, T Chen, M Ye, J Wang, S ZhuComputer Vision and Pattern Recognition (CVPR) 2025
  • Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation
Y Feng, C Wen, Z Peng, J Li, S ZhuComputer Vision and Pattern Recognition (CVPR) 2025
  • Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Z Zhang, J Liao, M Li, S Zhu, L Qin, W WangComputer Vision and Pattern Recognition (CVPR) 2025
  • Hallo2: Long-duration and high-resolution audio-driven portrait image animation
J Cui, H Li, Y Yao, H Zhu, H Shang, K Cheng, H Zhou, S Zhu, J WangInternational Conference on Learning Representations (ICLR) 2025
  • Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors
L Chen, K Liu, Y Lin, Z Li, S Zhu, X Cao, Y YaoInternational Conference on Learning Representations (ICLR) 2025
  • AlphaFolding: 4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment
K Cheng, C Liu, Q Su, J Wang, L Zhang, Y Tang, Y Yao, S Zhu, Y QiAnnual AAAI Conference on Artificial Intelligence (AAAI) 2025
  • Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
M Xu, H Li, Q Su, H Shang, L Zhang, C Liu, J Wang, Y Yao, S ZhuArXiv:2406.08801, 2024
  • Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
SH Zhu, J Chen, Z Dai, Y Xu, X Cao, Y Yao, H Zhu, S ZhuEuropean Conference on Computer Vision (ECCV) 2024
  • STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Y Zeng, Y Jiang, S Zhu, Y Lu, Y Lin, H Zhu, W Hu, X Cao, Y YaoEuropean Conference on Computer Vision (ECCV) 2024
  • Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Y He, W Yuan, S Zhu, Z Dong, L Bo, Q HuangEuropean Conference on Computer Vision (ECCV) 2024
  • Learning a Parametric 3D Full-Head for Free-View Synthesis in 360◦
Y He, Y Zhuang, Y Wang, Y Yao, S Zhu, X Li, Q Zhang, X Cao, H ZhuEuropean Conference on Computer Vision (ECCV) 2024
  • High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
Q He, X Ji, Y Gong, Y Lu, Z Diao, L Huang, Y Yao, S Zhu, Z Ma, S Xu, X Wu, Z Zhang, X Cao, H ZhuEuropean Conference on Computer Vision (ECCV) 2024
  • Animate Anything: Fine-Grained Open Domain Image Animation with Motion Guidance
Z Dai, Z Zhang, Y Yao, B Qiu, S Zhu, L Qin, W WangArxiv:2311.12886
  • VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
Q Zuo, X Gu, L Qiu, Y Dong, Z Zhao, W Yuan, R Peng, S Zhu, Z Dong, L Bo, Q HuangArxiv:2403.12010
  • OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
J Cai, Y He, W Yuan, S Zhu, Z Dong, L Bo, Q ChenIEEE Robotics and Automation Letters (RA-L) 2024
  • Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Y Lin, Z Dai, S Zhu, Y Yao.Computer Vision and Pattern Recognition (CVPR) 2024.
Google Sites
Report abuse
Google Sites
Report abuse