LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
H. Wang, CY Ma, YC Liu, J. Hou, T. Xu, J. Wang, F. Juefei-Xu, Y. Luo, P. Zhang, T. Hou, P. Vajda, N. Jha, X. Dai
CVPR, 2025, [Extended journal version]
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
P. Hansen-Estruch, D. Yan, CY Chuang, O. Zohar, J. Wang, T. Hou, T. Xu, S. Vishwanath, P. Vajda, X. Chen
ICML, 2025
DirectorLLM for Human-Centric Video Generation
K. Song, T. Hou, Z. He, H. Ma, J. Wang, A. Sinha, S. Tsai, Y. Luo, X. Dai, L. Chen, X. Xia, P. Zhang, P. Vajda, A. Elgammal, F. Juefei-Xu
BMVC, 2025
Pixel-Space Post-Training of Latent Diffusion Models
C. Zhang, S. Motwaini, M. Yu, J. Hou, F. Juefei-Xu, S. Tsai, P. Vajda, Z. He, J. Wang
ACM-MM RichMedia Workshop, 2025
Transfer between Modalities with MetaQueries
X. Pan, S. N. Shukla, A. Singh, Z. Zhao, S. K. Mishra, J. Wang, Z. Xu, J. Chen, K. Li, F. Juefei-Xu, J. Hou, S. Xie
arXiv, 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
X. Ma, P. Sun, H. Ma, H. Tang, CY Ma, J. Wang, K. Li, X. Dai, Y. Shi, X. Ju, Y. Hu, A. Sanakoyeu, F. Juefei-Xu, J. Hou, J. Tian, T. Xu, T. Hou, YC Liu, Z. He, Z. He, M. Feiszli, P. Zhang, P. Vajda, S. Tsai, Y. Fu
arXiv, 2025
Movie Gen: A Cast of Media Foundation Models
The Movie Gen Team (Core contributor)
Meta AI Tech Report, 2024
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
F. Wimbauer, B. Wu, E. Schoenfeld, X. Dai, J. Hou, Z. He, A. Sanakoyeu, P. Zhang, S. Tsai, J. Kohler, C. Rupprecht, D. Cremers, P. Vajda, J. Wang
CVPR, 2024
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
F. Liang, B. Wu, J. Wang, L. Yu, K. Li, Y. Zhao, I. Misra, JB Huang, P. Zhang, P. Vajda, D. Marculescu
CVPR, 2024
ControlRoom3D: Room Generation using Semantic Proxy Rooms
J. Schult, S. Tsai, L. Höllein, B. Wu, J. Wang, CY Ma, K. Li, X. Wang, F. Wimbauer, Z. He, P. Zhang, B. Leibe, P. Vajda, J. Hou
CVPR, 2024
Efficient Quantization Strategies for Latent Diffusion Models
Y. Yang, X. Dai, J. Wang, P. Zhang, H. Zhang
CVPR workshop on Efficient and On-Device Generation, 2024
An Analysis on Quantizing Diffusion Transformers
Y. Yang, J. Wang, X. Dai, P. Zhang, H. Zhang
CVPR workshop on Transformers for Vision, 2024
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
X. Dai∗, J. Hou∗, CY Ma∗, S Tsai∗, J. Wang∗, R. Wang∗, P. Zhang∗, S. Vandenhende, X. Wang, A. Dubey, M. Yu, A. Kadian, F. Radenovic, D. Mahajan, K. Li, Y. Zhao, V. Petrovic, M. K. Singh, S. Motwani, Y. Wen, Y. Song, R. Sumbaly†, V. Ramanathan†, Z. He†, P. Vajda†, D. Parikh†
Meta AI Tech Report, 2023
∗: Equal contribution: alphabetical order
†: joint last authors
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View Indoor 3D Object Detection
C. Xu, B. Wu, J. Hou, S. Tsai, R. Li, J. Wang, W. Zhan, Z. He, P. Vajda, K. Keutzer, M. Tomizuka
ICCV, 2023
A Practical Stereo Depth System for Smart Glasses
J. Wang, D. Scharstein, A. Bapat, K. Blackburn-Matzen, M. Yu, J. Lehman, S. Alsisan, Y. Wang, S. Tsai, JM Frahm, Z. He, P. Vajda, M. F. Cohen, M. Uyttendaele
CVPR, 2023
Consistent Direct Time-of-Flight Video Depth Super-Resolution
Z. Sun, W. Ye, J. Xiong, G. Choe, J. Wang, S. Su, R. Ranjan
CVPR, 2023