I am actively looking for Postdocs to join my Lab.
I am casually looking for Research Interns.
I select Ph.D. students from my research intern pool.
Contact me at zhaohao@air.tsinghua.edu.cn
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors, SIGGRAPH 2026
Relit-LiVE: Relight Video by Jointly Learning Environment Video, SIGGRAPH 2026
Scalable Training of 3D Gaussian Splatting via Out-of-Core Optimization, ICML 2026
Automated Synthesis of Facial Mechanisms for Conversational Animatronic Robots, RSS 2026
Self-Improving Robot Policy with Compositional World Model, RSS 2026
ORV: 4D Occupancy-centric Robot Video Generation, CVPR 2026
Benchmarking PhD-Level Coding in 3D Geometric Computer Vision, CVPR 2026
Native and Compact Structured Latents for 3D Generation, CVPR 2026
PAM: A Pose–Appearance–Motion Engine for Sim-to-Real HOI Video Generation, CVPR 2026
DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images, CVPR 2026
Tokenizing Vector Animation for Autoregresive Generation, CVPR 2026
NeAR: Coupled Neural Asset–Renderer Stack, CVPR 2026
UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos, CVPR 2026
Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data, ICRA 2026
UniUncer: Unified Dynamic–Static Uncertainty for End-to-End Driving, ICRA 2026
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation, ICRA 2026
Unified Map Prior Encoder for Mapping and Planning, ICRA 2026
Dexora: Open-source VLA for High-DoF Bimanual Dexterity, ICRA 2026
Light of Normals: Unified Feature Representation for Universal Photometric Stereo, ICLR 2026
CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence under Partial Observations, ICLR 2026
Light-X: Generative 4D Video Rendering with Camera and Illumination Control, ICLR 2026
DanceTogether: Generating Interactive Multi-Person Video without Identity Drifting, ICLR 2026
Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping, ICLR 2026
ThinkMatter: Panoramic-Aware Instructional Semantics for Monocular Vision-and-Language Navigation, TIP 2026
Ultraman: Ultra-Fast and High-Resolution Texture Generation for 3D Human Reconstruction from a Single Image, MVA 2026
Gen-NCAP: A Generative Simulator for Corner Case Benchmarking in End-to-End Autonomous Driving, IASEAI 2026
Challenger: Affordable Adversarial Driving Video Generation for Safety Testing, IASEAI 2026
Hoodie: Hierarchical point cloud and latent code diffusion for joint and conditional generation, Nerucomputing 2025
3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting, WACV 2026
FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks, WACV 2026
GRADRobot: Geometry-Aware Rendering with Articulation and Diffusion for Robot Modeling, 3DV 2026
GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects, 3DV 2026
SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting, T-PAMI 2025
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting, NeurIPS 2025
SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models, NeurIPS 2025
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models, NeurIPS 2025
One View, Many Worlds: Single-Image to 3D object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation, CoRL 2025
Elucidating the Design Space of Torque-aware Vision-Language-Action Models, CoRL 2025
RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation, CoRL 2025
Butter: Frequency-Adaptive Feature Consistency and Progressive Hierarchical Fusion for Efficient Object Detection in Autonomous Driving, MM 2025
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction for Autonomous Driving with Gaussian Splatting, ICCV 2025
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging, ICCV 2025
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation, ICCV 2025
InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling, ICCV 2025
Detect Anything 3D in the Wild, ICCV 2025
TeX-NeRF: Neural Radiance Fields for Novel HADAR View Synthesis, IROS 2025
PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth, IROS 2025
CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting, IROS 2025
SparseMeXt: Unlocking the Potential of Sparse Representations for HD Map Construction, IROS 2025
Reusing Attention for One-stage Lane Topology Understanding, IROS 2025
Delving into Mapping Uncertainty for Mapless Trajectory Prediction, IROS 2025
In Context Meta LoRA Generation, IJCAI 2025
Masked PaCONet: Self-supervised Part-aware Implicit Shape Reconstruction, TCSVT 2025
Morpheus: A Neural-driven Animatronic Face with Hybrid Actuation and Diverse Emotion Control, RSS 2025
PartRM: Modeling Part-Level Dynamics with Large 4D Reconstruction Model, CVPR 2025
Crafting a Miniature Interactive World from a Single Image, CVPR 2025
UniScene: Unified Occupancy-centric Driving Scene Generation, CVPR 2025
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling, ICLR 2025
AVD2: Accident Video Diffusion for Accident Video Description, ICRA 2025
PUGS: Zero-shot Physical Understanding with Gaussian Splatting, ICRA 2025
Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction, ICRA 2025
SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing, CVM 2025
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs, COLING 2025
LiON :Learning Point-wise Abstaining Penalty for LiDAR Outlier DetectioN Using Diverse Synthetic Data, AAAI 2025
Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences, WACV 2025
Diffusion-based Visual Anagram as Multi-task Learninga, WACV 2025
Dual-frame Fluid Motion Estimation with Test-time Optimization and Zero-divergence Loss, NeurIPS 2024
Locate n' Rotate: Two-stage Openable Part Detection with Geometric Foundation Model Priors, ACCV 2024
Hint-AD: Holistically Aligned Interpretability for End-to-End Autonomous Driving, CoRL 2024
P-MapNet: Far-seeing map generator enhanced by both SDMap and HDMap priors, RA-L 2024
Inverse Rendering of Outdoor Scenes with under Time-variant Illumination, BMVC 2024
Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty, BMVC 2024
Model Merging for Multi-target Domain Adaptation, ECCV 2024
Structured-NeRF: Hierarchical Scene Graph with Neural Representation, ECCV 2024
SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior, ECCV 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes, ECCV 2024
Active Neural Mapping at Scale, IROS 2024
Large Language Models Powered Context-aware Motion Prediction, IROS 2024
PreAfford: An Affordance-based Pre-grasping Framework with high Adaptability, IROS 2024
Blending Distributed NeRFs with Tri-stage Robust Pose Optimization, IROS 2024
FairDiff: Fair Segmentation with Point-Image Diffusion, MICCAI 2024
Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids, SIGGRAPH 2024
Encoding biological metaverse: Advancements and challenges in neural fields from macroscopic to microscopic, The Innovation 2024 (IF: 33.1)
Adaptive Surface Normal Constraint for Geometric Estimation From Monocular Images, T-PAMI 2024 (In-the-wild depth and normal, https://www.xxlong.site/ASNDepth/)
Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration, CVPR 2024
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis, CVPR 2024
FastMAC: Stochastic Spectral Sampling of Correspondence Graph, CVPR 2024
ECT: Fine-grained Edge Detection with Learned Cause Tokens, IVC 2024
Camera Relocalization in Shadow-free Neural Radiance Fields, ICRA 2024
MonoOcc: Digging into Monocular Semantic Occupancy Prediction, ICRA 2024
Block-Map-Based Localization in Large-Scale Environment, ICRA 2024
Car-Studio: Learning Car Radiance Fields from Single-View and Unlimited In-the-wild Images, RA-L 2024
SlimmeRF: Slimmable Radiance Fields, 3DV 2024 (Best Paper, https://github.com/Shiran-Yuan/SlimmeRF)
PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection, NeurIPS 2023
MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving, CICAI 2023 (Best Paper Runner-up, https://open-air-sun.github.io/mars/ )
DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection, ICCV 2023
City-scale continual neural semantic mapping with three-layer sampling and panoptic representation, KBS 2023
INT2: Interactive Trajectory Prediction at Intersections, ICCV 2023
3D Implicit Transporter for Temporally Consistent Keypoint Discovery, ICCV 2023
Understanding Embodied Reference with Touch-Line Transformer, ICLR 2023
From Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds, ICRA 2023
ADAPT: Action-aware Driving Caption Transformer, ICRA 2023
LODE: Locally Conditioned Eikonal Implicit Scene Completion from Sparse LiDAR, ICRA 2023
STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation, ICRA 2023
Delving into Shape-aware Zero-shot Semantic Segmentation, CVPR 2023
DPF: Learning Dense Prediction Fields with Weak Supervision, CVPR 2023
Planning assembly sequence with graph transformer, ICRA 2023
LATITUDE: Robotic Global Localization with Truncated Dynamic Low-pass Filter in City-scale NeRF, ICRA 2023
Unsupervised Road Anomaly Detection with Language Anchors, ICRA 2023
Toist: Task oriented instance segmentation transformer with noun-pronoun distillation, NeurIPS 2022
SNAKE: Shape-aware Neural 3D Keypoint Field, NeurIPS 2022
A boundary-guided transformer for measuring distance from rectal tumor to anal verge on magnetic resonance images, Cell Patterns 2023
Language-guided Semantic Style Transfer of 3D Indoor Scenes, PIES-ME 2022
Measuring distance from lowest boundary of rectal tumor to anal verge on CT images using pyramid attention pooling transformer, CIBM 2023
VIBUS: Data-efficient 3D scene parsing with VIewpoint Bottleneck and Uncertainty-Spectrum modeling, ISPRS 2022
Sc-wls: Towards interpretable feed-forward camera re-localization, ECCV 2022
Distance-Aware Occlusion Detection With Focused Attention, T-IP 2022
Brick Yourself within 3 Minutes, ICRA 2022
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing, CVPR 2022
Pq-transformer: Jointly parsing 3d objects and layouts from point clouds, RA-L&ICRA 2022
Pointly-supervised scene parsing with uncertainty mixture, CVIU 2020
3d room layout estimation from a single rgb image, T-MM 2020
Seeing Through the Occluders: Robust Monocular 6-DOF Object Pose Tracking via Model-Guided Video Object Segmentation, RA-L&IROS 2020
Learning to Draw Sight Lines, IJCV 2019
Deeply-supervised knowledge synergy, CVPR 2019
A closed-form solution to universal style transfer, ICCV 2019
Efficient semantic scene completion network with spatial group convolution, ECCV 2018
Decoder network over lightweight reconstructed feature for fast semantic style transfer, ICCV 2017
Network sketching: Exploiting binary structure in deep cnns, CVPR 2017
Physics inspired optimization on semantic transfer features: An alternative method for room layout estimation, CVPR 2017
Talk Show @ ASPARA 2020
Young Scientist Representative for Intel 2021
Young Scientist Representative for Intel 2021
BUCEA-Pinlan AI+Design Seminar
Implicit Representation Seminar
Welcome to our ICRA 2022 Sim2Real Challenge
Intel China TikTok Campaign 2022
Seeing old and new friends @ Bytedance AI
Proud to have several (many?) papers accepted to ICRA 2023, on 3D scene understanding and its applications.
CCL 2023 Tutorial on LLM for robotics
Seeing old and new friends @ THUEE
Presenting ADAPT at FISITA Forum
【#具身智能的使用前... - @环球人物杂志的微博 - 微博 (weibo.com)
“对话科学家”:具身智能,如何塑造未来-今日头条 (toutiao.com)
“对话科学家”:具身智能,如何塑造未来|赵昊|大模型|人工智能|仿生机器人_网易订阅 (163.com)
【一点资讯】“对话科学家”:具身智能,如何塑造未来 www.yidianzixun.com
“对话科学家”:具身智能,如何塑造未来 (peopleapp.com)
“对话科学家”:具身智能,如何塑造未来_中华网 (china.com)
“对话科学家”:具身智能,如何塑造未来 - 国内 - 环球人物网-有温度的人物网站 (globalpeople.com.cn)
Thrilled and proud.
(Honored to Give This) Guest Lecture
Talk @ CARIAD China
A gift from life (from fate?)
CCDM 2024
ChinaGraph 2024 Tutorial
CIRAC 2024 Talk
This is ARTS
AI4E is so cool
Visiting Shanghai AIlab
Visiting teleAI
详细议程 | 中国无人驾驶装备应用创新生态大会 (qq.com) WAIC-AD Forum
Embodied AI Forum
PreAfford !!!
Visiting Ant Research
Wiley AI+X Seminar
VIVO-VCAN Talk
Check this cool stuff
Talking about Deepseek (although I do computer vision...
CEAI 2025 Talk
https://mp.weixin.qq.com/s/xK-E2AiJD_IDPR8lLJB78g
VENUE@ECCV 2024
Visiting BUAA
RosCon China 2024
The most legendary talk in my career ....
Visiting Beihang Hangzhou (for the first time)
https://www.orangenews.hk/hkviews/1256517/%E8%A7%A3%E5%B1%80-%E5%85%A8%E7%90%83AI%E6%B2%BB%E7%90%86%E9%9C%80%E6%9B%B4%E5%A4%9A%E5%85%B1%E5%90%8C%E8%A1%8C%E5%8B%95.shtml
Tecent AD-WM talk
https://mp.weixin.qq.com/s/Auif5aY7KkIZsa33SnasAw
ChinaLLM 2025
Sim2Real Panel
Spatial Intelligence Panel
https://mp.weixin.qq.com/s/i3wSc5Zq5jCiuesXom2xKg
World Model Workshop
ADL 166 World Model Talk
GAMES 2025 talk
A16Z Visit
GAIR 2025
China Unicom Talk
Winter3DV 2026
CEAI 2026 Talk
AI4E Forum 2026
Humanoid Marathon Talk
VALSE 2025
https://mp.weixin.qq.com/s/UcBWxuRvzoNmR7My-aWZQQ
MEIS 2025
https://coop-intelligence.github.io/#schedule
WAIC 2025
Visiting SH-CZ
IROS Party
ConTech 2025
JD EAI Salon
BAAI Scholar Retreat 2026
FAIR Plus Talk
ICLR livecast
http://www.jjckb.cn/20250519/1714675a32114ccebd8c0a871a22693d/c.html
ChinaMAS 2025
https://mp.weixin.qq.com/s/ZhW6yTIKr5iueSZPjlW06g
GAIR World Model Panel
CSIG 2025
https://mp.weixin.qq.com/s/UW_AGfr6z6kJSIVxW0_DyA
Mini3DV
World Models Talk
Visiting SJTU
EAIRCON World Model Talk
OpenDay
https://mp.weixin.qq.com/s/GCOJHkpniePi3Qq0jWZVpA
Jade River Talk 10 years
See Annual Party
China3DV 2026