I am a research scientist in Meta GenAI. I now mainly work on media foundation models (emu, movie gen) and multimodal foundation models (llama). I previously worked on depth estimation, on-device computer vision and human vision-inspired computer vision. Prior to Meta, I obtained my Ph.D. from Harvard University advised by Prof. Todd Zickler and my B.A.Sc from the University of Toronto advised by Prof. Sven Dickinson and Prof. Sanja Fidler.
💻 Publications
2025
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
H. Wang, CY Ma, YC Liu, J. Hou, T. Xu, J. Wang, F. Juefei-Xu, Y. Luo, P. Zhang, T. Hou, P. Vajda, N. Jha, X. Dai
CVPR, 2025Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
P. Hansen-Estruch, D. Yan, CY Chung, O. Zohar, J. Wang, T. Hou, T. Xu, S. Vishwanath, P. Vajda, X. Chen
arXiv, 2025
2024
DirectorLLM for Human-Centric Video Generation
K. Song, T. Hou, Z. He, H. Ma, J. Wang, A. Sinha, S. Tsai, Y. Luo, X. Dai, L. Chen, X. Xia, P. Zhang, P. Vajda, A. Elgammal, F. Juefei-Xu
arXiv, 2024Movie Gen: A Cast of Media Foundation Models
The Movie Gen Team (Core contributor)
Meta AI Tech Report, 2024Pixel-Space Post-Training of Latent Diffusion Models
C. Zhang, S. Motwaini, M. Yu, J. Hou, F. Juefei-Xu, S. Tsai, P. Vajda, Z. He, J. Wang
arXiv, 2024Cache Me if You Can: Accelerating Diffusion Models through Block Caching
F. Wimbauer, B. Wu, E. Schoenfeld, X. Dai, J. Hou, Z. He, A. Sanakoyeu, P. Zhang, S. Tsai, J. Kohler, C. Rupprecht, D. Cremers, P. Vajda, J. Wang
CVPR, 2024FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
F. Liang, B. Wu, J. Wang, L. Yu, K. Li, Y. Zhao, I. Misra, JB Huang, P. Zhang, P. Vajda, D. Marculescu
CVPR, 2024ControlRoom3D: Room Generation using Semantic Proxy Rooms
J. Schult, S. Tsai, L. Höllein, B. Wu, J. Wang, CY Ma, K. Li, X. Wang, F. Wimbauer, Z. He, P. Zhang, B. Leibe, P. Vajda, J. Hou
CVPR, 2024Efficient Quantization Strategies for Latent Diffusion Models
Y. Yang, X. Dai, J. Wang, P. Zhang, H. Zhang
CVPR workshop on Efficient and On-Device Generation, 2024An Analysis on Quantizing Diffusion Transformers
Y. Yang, J. Wang, X. Dai, P. Zhang, H. Zhang
CVPR workshop on Transformers for Vision, 2024
2023
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
X. Dai∗, J. Hou∗, CY Ma∗, S Tsai∗, J. Wang∗, R. Wang∗, P. Zhang∗, S. Vandenhende, X. Wang, A. Dubey, M. Yu, A. Kadian, F. Radenovic, D. Mahajan, K. Li, Y. Zhao, V. Petrovic, M. K. Singh, S. Motwani, Y. Wen, Y. Song, R. Sumbaly†, V. Ramanathan†, Z. He†, P. Vajda†, D. Parikh†
Meta AI Tech Report, 2023
∗: Equal contribution: alphabetical order
†: joint last authorsNeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View Indoor 3D Object Detection
C. Xu, B. Wu, J. Hou, S. Tsai, R. Li, J. Wang, W. Zhan, Z. He, P. Vajda, K. Keutzer, M. Tomizuka
ICCV, 2023A Practical Stereo Depth System for Smart Glasses
J. Wang, D. Scharstein, A. Bapat, K. Blackburn-Matzen, M. Yu, J. Lehman, S. Alsisan, Y. Wang, S. Tsai, JM Frahm, Z. He, P. Vajda, M. F. Cohen, M. Uyttendaele
CVPR, 2023Consistent Direct Time-of-Flight Video Depth Super-Resolution
Z. Sun, W. Ye, J. Xiong, G. Choe, J. Wang, S. Su, R. Ranjan
CVPR, 2023
2022
Toward practical monocular indoor depth estimation
CY Wu, J. Wang, M. Hall, U. Neumann, S. Su
CVPR, 2022
2021
FBNetV5: Neural architecture search for multiple tasks in one run
B. Wu, C. Li, H. Zhang, X. Dai, P. Zhang, M. Yu, J. Wang, Y. Lin and P. Vajda
arXiv, 2021Level set binocular stereo with occlusions
J. Wang, T. Zickler
arXiv, 2021Level set stereo for cooperative grouping with occlusion
J. Wang, T. Zickler
ICIP, 2021
Before 2020
A lighting-invariant point processing for shading
K. Heal, J. Wang, S. J. Gortler, T. Zickler
CVPR, 2020Interpreting robust optimization via adversarial influence functions
Z. Deng, C. Dwork, J. Wang, L. Zhang
alphabetical order
ICML, 2020Improving deep stereo network generalization with geometric priors
J. Wang, V. Jampani, D. Sun, C. Loop, S. Birchfield, J. Kautz
arXiv, 2020Local detection of stereo occlusion boundaries
J. Wang, T. Zickler
CVPR, 2019A computational model for local stereo occlusion boundary detection
J. Wang, T. Zickler,
Journal of Vision, VSS Abstract, 2019 [poster][project][stereoscope viewer]Half-occlusion boundary detectors in computational stereo vision
J. Wang, D. Glasner, T. Zickler,
Journal of Vision. VSS Abstract, 2018 [slides][project][stereoscope viewer]Toward perceptually-consistent stereo: A scanline study
J. Wang, D.Glasner, T. Zickler
ICCV, 2017
📃 Patents
Wang, J., et al. "Distance determinations using one or more neural networks."
U.S. Patent Application No. 16/852,944
⌨️ Service
Reviewer: CVPR'20-24, NeurIPS'20-24, ICML'21-23, ICCV 21,23, ICLR'21-22, BMVC'20, ACCV'20, WACV'21-22, ECCV'22, 24