Hao Zhao (赵昊)

I am an assistant professor at Tsinghua University.

I was a research scientist at Intel Labs China.

I was a joint postdoc affiliated to Peking University.

I got my Ph.D. and bachelor degrees from the Electronic Engineering department of Tsinghua University.

Proudly, I was a former leader of Skyworks (天空工场), the largest robotics club at THU and an amazing geek utopia.

I am generally interested in any computer vision fields related to robotics, especially 3D scene understanding.

I was a serial entrepreneur since 2009, co-launching 10+ startups covering social networks, web development tools, unmanned aerial vehicles, intelligent delivery, smart grid, VR devices, virtual human, cloud design, autonomous driving, smart manufacturing, etc.

Recruitment

I am actively looking for Postdocs to join my Lab.
I am casually looking for Research Interns.
I select Ph.D. students from my research intern pool.
Contact me at zhaohao@air.tsinghua.edu.cn

Publications

Feedforward 3D Editing Learns from Semantic-Part Transformation, SIGGRAPH Asia 2026
Monocular Avatar Reconstruction via Cascaded Diffusion Priors and UV-Space Differentiable Shading, ECCV 2026
LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models, ECCV 2026
One Video, One World: Turning Monocular Video into Physical 4D Scenes, ECCV 2026
Plug-and-Play Traffic Element Awareness for End-to-End Autonomous Driving, ECCV 2026
OmniNWM: Unifying the State-Action-Reward Triad for Closed-Loop Panoramic Driving Navigation World Models, ECCV 2026
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method, T-PAMI 2026
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors, SIGGRAPH 2026
Relit-LiVE: Relight Video by Jointly Learning Environment Video, SIGGRAPH 2026
Scalable Training of 3D Gaussian Splatting via Out-of-Core Optimization, ICML 2026
Automated Synthesis of Facial Mechanisms for Conversational Animatronic Robots, RSS 2026 (Best Paper Final List)
Self-Improving Robot Policy with Compositional World Model, RSS 2026
ORV: 4D Occupancy-centric Robot Video Generation, CVPR 2026
Benchmarking PhD-Level Coding in 3D Geometric Computer Vision, CVPR 2026
Native and Compact Structured Latents for 3D Generation, CVPR 2026 (Best Student Paper)
PAM: A Pose–Appearance–Motion Engine for Sim-to-Real HOI Video Generation, CVPR 2026
DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images, CVPR 2026
Tokenizing Vector Animation for Autoregresive Generation, CVPR 2026
NeAR: Coupled Neural Asset–Renderer Stack, CVPR 2026
UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos, CVPR 2026
Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data, ICRA 2026
UniUncer: Unified Dynamic–Static Uncertainty for End-to-End Driving, ICRA 2026
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation, ICRA 2026
Unified Map Prior Encoder for Mapping and Planning, ICRA 2026
Dexora: Open-source VLA for High-DoF Bimanual Dexterity, ICRA 2026 (Best Paper Final List)
Light of Normals: Unified Feature Representation for Universal Photometric Stereo, ICLR 2026
CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence under Partial Observations, ICLR 2026
Light-X: Generative 4D Video Rendering with Camera and Illumination Control, ICLR 2026
DanceTogether: Generating Interactive Multi-Person Video without Identity Drifting, ICLR 2026
Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping, ICLR 2026
ThinkMatter: Panoramic-Aware Instructional Semantics for Monocular Vision-and-Language Navigation, TIP 2026
Ultraman: Ultra-Fast and High-Resolution Texture Generation for 3D Human Reconstruction from a Single Image, MVA 2026
Gen-NCAP: A Generative Simulator for Corner Case Benchmarking in End-to-End Autonomous Driving, IASEAI 2026
Challenger: Affordable Adversarial Driving Video Generation for Safety Testing, IASEAI 2026
Hoodie: Hierarchical point cloud and latent code diffusion for joint and conditional generation, Nerucomputing 2025
3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting, WACV 2026
FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks, WACV 2026
GRADRobot: Geometry-Aware Rendering with Articulation and Diffusion for Robot Modeling, 3DV 2026
GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects, 3DV 2026
SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting, T-PAMI 2025
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting, NeurIPS 2025
SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models, NeurIPS 2025
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models, NeurIPS 2025
One View, Many Worlds: Single-Image to 3D object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation, CoRL 2025
Elucidating the Design Space of Torque-aware Vision-Language-Action Models, CoRL 2025
RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation, CoRL 2025
Butter: Frequency-Adaptive Feature Consistency and Progressive Hierarchical Fusion for Efficient Object Detection in Autonomous Driving, MM 2025
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction for Autonomous Driving with Gaussian Splatting, ICCV 2025
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging, ICCV 2025
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation, ICCV 2025
InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling, ICCV 2025
Detect Anything 3D in the Wild, ICCV 2025
TeX-NeRF: Neural Radiance Fields for Novel HADAR View Synthesis, IROS 2025
PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth, IROS 2025
CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting, IROS 2025
SparseMeXt: Unlocking the Potential of Sparse Representations for HD Map Construction, IROS 2025
Reusing Attention for One-stage Lane Topology Understanding, IROS 2025
Delving into Mapping Uncertainty for Mapless Trajectory Prediction, IROS 2025
In Context Meta LoRA Generation, IJCAI 2025
Masked PaCONet: Self-supervised Part-aware Implicit Shape Reconstruction, TCSVT 2025
Morpheus: A Neural-driven Animatronic Face with Hybrid Actuation and Diverse Emotion Control, RSS 2025
PartRM: Modeling Part-Level Dynamics with Large 4D Reconstruction Model, CVPR 2025
Crafting a Miniature Interactive World from a Single Image, CVPR 2025
UniScene: Unified Occupancy-centric Driving Scene Generation, CVPR 2025
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling, ICLR 2025
AVD2: Accident Video Diffusion for Accident Video Description, ICRA 2025
PUGS: Zero-shot Physical Understanding with Gaussian Splatting, ICRA 2025
Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction, ICRA 2025
SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing, CVM 2025
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs, COLING 2025
LiON :Learning Point-wise Abstaining Penalty for LiDAR Outlier DetectioN Using Diverse Synthetic Data, AAAI 2025
Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences, WACV 2025
Diffusion-based Visual Anagram as Multi-task Learninga, WACV 2025
Dual-frame Fluid Motion Estimation with Test-time Optimization and Zero-divergence Loss, NeurIPS 2024
Locate n' Rotate: Two-stage Openable Part Detection with Geometric Foundation Model Priors, ACCV 2024
Hint-AD: Holistically Aligned Interpretability for End-to-End Autonomous Driving, CoRL 2024
P-MapNet: Far-seeing map generator enhanced by both SDMap and HDMap priors, RA-L 2024
Inverse Rendering of Outdoor Scenes with under Time-variant Illumination, BMVC 2024
Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty, BMVC 2024
Model Merging for Multi-target Domain Adaptation, ECCV 2024
Structured-NeRF: Hierarchical Scene Graph with Neural Representation, ECCV 2024
SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior, ECCV 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes, ECCV 2024
Active Neural Mapping at Scale, IROS 2024
Large Language Models Powered Context-aware Motion Prediction, IROS 2024
PreAfford: An Affordance-based Pre-grasping Framework with high Adaptability, IROS 2024
Blending Distributed NeRFs with Tri-stage Robust Pose Optimization, IROS 2024
FairDiff: Fair Segmentation with Point-Image Diffusion, MICCAI 2024
Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids, SIGGRAPH 2024
Encoding biological metaverse: Advancements and challenges in neural fields from macroscopic to microscopic, The Innovation 2024 (IF: 33.1)
Adaptive Surface Normal Constraint for Geometric Estimation From Monocular Images, T-PAMI 2024 (In-the-wild depth and normal, https://www.xxlong.site/ASNDepth/)
Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration, CVPR 2024
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis, CVPR 2024
FastMAC: Stochastic Spectral Sampling of Correspondence Graph, CVPR 2024
ECT: Fine-grained Edge Detection with Learned Cause Tokens, IVC 2024
Camera Relocalization in Shadow-free Neural Radiance Fields, ICRA 2024
MonoOcc: Digging into Monocular Semantic Occupancy Prediction, ICRA 2024
Block-Map-Based Localization in Large-Scale Environment, ICRA 2024
Car-Studio: Learning Car Radiance Fields from Single-View and Unlimited In-the-wild Images, RA-L 2024
SlimmeRF: Slimmable Radiance Fields, 3DV 2024 (Best Paper, https://github.com/Shiran-Yuan/SlimmeRF)
PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection, NeurIPS 2023
MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving, CICAI 2023 (Best Paper Runner-up, https://open-air-sun.github.io/mars/ )
DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection, ICCV 2023
City-scale continual neural semantic mapping with three-layer sampling and panoptic representation, KBS 2023
INT2: Interactive Trajectory Prediction at Intersections, ICCV 2023
3D Implicit Transporter for Temporally Consistent Keypoint Discovery, ICCV 2023
Understanding Embodied Reference with Touch-Line Transformer, ICLR 2023
From Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds, ICRA 2023
ADAPT: Action-aware Driving Caption Transformer, ICRA 2023
LODE: Locally Conditioned Eikonal Implicit Scene Completion from Sparse LiDAR, ICRA 2023
STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation, ICRA 2023
Delving into Shape-aware Zero-shot Semantic Segmentation, CVPR 2023
DPF: Learning Dense Prediction Fields with Weak Supervision, CVPR 2023
Planning assembly sequence with graph transformer, ICRA 2023
LATITUDE: Robotic Global Localization with Truncated Dynamic Low-pass Filter in City-scale NeRF, ICRA 2023
Unsupervised Road Anomaly Detection with Language Anchors, ICRA 2023
Toist: Task oriented instance segmentation transformer with noun-pronoun distillation, NeurIPS 2022
SNAKE: Shape-aware Neural 3D Keypoint Field, NeurIPS 2022
A boundary-guided transformer for measuring distance from rectal tumor to anal verge on magnetic resonance images, Cell Patterns 2023
Language-guided Semantic Style Transfer of 3D Indoor Scenes, PIES-ME 2022
Measuring distance from lowest boundary of rectal tumor to anal verge on CT images using pyramid attention pooling transformer, CIBM 2023
VIBUS: Data-efficient 3D scene parsing with VIewpoint Bottleneck and Uncertainty-Spectrum modeling, ISPRS 2022
Sc-wls: Towards interpretable feed-forward camera re-localization, ECCV 2022
Distance-Aware Occlusion Detection With Focused Attention, T-IP 2022
Brick Yourself within 3 Minutes, ICRA 2022
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing, CVPR 2022
Pq-transformer: Jointly parsing 3d objects and layouts from point clouds, RA-L&ICRA 2022
Pointly-supervised scene parsing with uncertainty mixture, CVIU 2020
3d room layout estimation from a single rgb image, T-MM 2020
Seeing Through the Occluders: Robust Monocular 6-DOF Object Pose Tracking via Model-Guided Video Object Segmentation, RA-L&IROS 2020
Learning to Draw Sight Lines, IJCV 2019
Deeply-supervised knowledge synergy, CVPR 2019
A closed-form solution to universal style transfer, ICCV 2019
Efficient semantic scene completion network with spatial group convolution, ECCV 2018
Decoder network over lightweight reconstructed feature for fast semantic style transfer, ICCV 2017
Network sketching: Exploiting binary structure in deep cnns, CVPR 2017
Physics inspired optimization on semantic transfer features: An alternative method for room layout estimation, CVPR 2017

Other Staff

Talk Show @ ASPARA 2020

https://lnkd.in/dBG6N5e

Young Scientist Representative for Intel 2021

https://lnkd.in/dC7rkFG

Young Scientist Representative for Intel 2021

BUCEA-Pinlan AI+Design Seminar

https://www.sohu.com/a/490008464_116897

Implicit Representation Seminar

Welcome to our ICRA 2022 Sim2Real Challenge

http://air.tsinghua.edu.cn/robomaster/sim2real_icra23.html

Intel China TikTok Campaign 2022

https://v.douyin.com/YEKDmCo/

Seeing old and new friends @ Bytedance AI

【CICAI 2023】具身智能讲习班活动预告 (qq.com)

Proud to have several (many?) papers accepted to ICRA 2023, on 3D scene understanding and its applications.

CCL 2023 Tutorial on LLM for robotics

讲习班 - CCL 2023 (cips-cl.org)

Seeing old and new friends @ THUEE

学长话前程·秩年系列｜第34期：九字班“男神”带你做科技创新发展的弄潮儿 (qq.com)

Presenting ADAPT at FISITA Forum

ISCC 2023 | 国际青年科学家论坛——可信人工智能最终日程发布 (qq.com)

Special Issue on Embodied AI

“对话科学家”：具身智能，如何塑造未来 (qq.com)

【#具身智能的使用前... - @环球人物杂志的微博 - 微博 (weibo.com)

“对话科学家”：具身智能，如何塑造未来-今日头条 (toutiao.com)

腾讯新闻创作服务平台 (qq.com)

“对话科学家”：具身智能，如何塑造未来|赵昊|大模型|人工智能|仿生机器人_网易订阅 (163.com)

【一点资讯】“对话科学家”：具身智能，如何塑造未来 www.yidianzixun.com

“对话科学家”：具身智能，如何塑造未来 (uc.cn)

“对话科学家”：具身智能，如何塑造未来 (peopleapp.com)

“对话科学家”：具身智能，如何塑造未来_中华网 (china.com)

“对话科学家”：具身智能，如何塑造未来 - 国内 - 环球人物网-有温度的人物网站 (globalpeople.com.cn)

Thrilled and proud.

(Honored to Give This) Guest Lecture

Talk @ CARIAD China

「真格精酿·具身智能圆桌派」招募中，迈向智能新纪元｜Z Events (qq.com)

A gift from life (from fate?)

对话清华AIR赵昊：生成式仿真为具身智能释放无限灵感 (qq.com)

经济网在线阅读 (jingji.com.cn)

CCDM 2024

ChinaGraph 2024 Tutorial

Chinagraph 2024会前课程：面向具身智能的图形计算 (qq.com)

CIRAC 2024 Talk

智能机器人学术年会（CIRAC 2024）大会日程发布 (qq.com)

This is ARTS

终版议程：首届焉知具身智能机器人年会2024

AI4E is so cool

Visiting Shanghai AIlab

mp.weixin.qq.com/s/k3OMSH4Q0ySjvJHgM8pabA

Visiting teleAI

一线专家齐聚TeleAI，论道“具身智能” (qq.com)

详细议程 | 中国无人驾驶装备应用创新生态大会 (qq.com) WAIC-AD Forum

Embodied AI Forum

mp.weixin.qq.com/s/iTIUCEpExcmGgx-MExfAVQ

PreAfford ！！！

新建标签页 (air-discover.github.io)

Visiting Ant Research

Wiley AI+X Seminar

Wiley+清华大学"AI+X" 人工智能交叉学科前沿学术研讨会邀您参加！

VIVO-VCAN Talk

Check this cool stuff

Talking about Deepseek (although I do computer vision...

CEAI 2025 Talk

https://mp.weixin.qq.com/s/xK-E2AiJD_IDPR8lLJB78g

CICV 2024

【活动预告】日程首发！第十一届国际智能网联汽车技术年会（CICV2024）亮点揭晓 (qq.com)

VENUE@ECCV 2024

VENUE@ECCV 2024 (venue-tutorial.github.io)

Visiting BUAA

车城融合发展论坛暨中国汽研车路云一体化产品合作伙伴招募和感谢会

RosCon China 2024

ROSCon China来袭！具身智能硬核分享就在这场Workshop！

The most legendary talk in my career ....

Visiting Beihang Hangzhou (for the first time)

https://www.orangenews.hk/hkviews/1256517/%E8%A7%A3%E5%B1%80-%E5%85%A8%E7%90%83AI%E6%B2%BB%E7%90%86%E9%9C%80%E6%9B%B4%E5%A4%9A%E5%85%B1%E5%90%8C%E8%A1%8C%E5%8B%95.shtml