Email: gaozhongpai@gmail.com
Location: Cambridge, MA 02140, USA
I am currently an Expert Research Scientist at UII America. I finished my postdoctoral research with Prof Xiaokang Yang in 2021 and PhD study under the supervision of Prof. Guangtao Zhai in 2018 at Shanghai Jiao Tong University. During my PhD, I did my joint PhD study at Schepens Eye Research Institute, Harvard Medical School, under the supervision of Prof. Eli Peli.
My research focuses on 3D computer vision and AR/VR. I am interested in 3D reconstruction, NeRF/3DGS, and video-LLM. I collaborate with several professors, including Prof Junchi Yan, Prof Juyong Zhang, and Prof Menghan Hu.
Previously, I worked on visually induced motion sickness (VIMS) in stereoscopic 3D and head-mounted displays. I also worked on psychovisual modulation technology with applications of dual-view display, invisible QR Code, and so on.
Openings
We have several openings (internship & full-time). More details and application here.
News
02-26-2025: Our paper Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding is accepted by CVPR 2025 @ Nashville
01-22-2025: Our three papers all are accepted by ICLR 2025 @ Singapore
10-28-2024: Our paper "Automated Patient Positioning with Learned 3D Hand Gestures" is accepted by WACV 2025 @ Tucson
09-25-2024: Our paper "DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering" is accepted by NeurIPS 2024 @ Vancouver
07-01-2024: Our paper "Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images" is accepted by ECCV 2024 @ Milan
06-28-2024: Our paper "Hidden Barcode in Sub-Images with Invisible Locating Marker " is accepted by TOMM
05-13-2024: Our paper "Few-Shot 3D Volumetric Segmentation with Multi-Surrogate Fusion" is early accepted by MICCAI 2024 @ Morocco
04-16-2024: Our paper "Towards Universal Training-Free Coverless Image Steganography with Diffusion Models" is accepted by IJCAI 2024 @ Jeju
02-26-2024: Our paper "DaReNeRF: Direction-aware Representation for Dynamic Scenes" is accepted by CVPR 2024 @ Seattle
01-16-2024: Our paper "PBADet: A One-Stage Anchor-Free Approach for Part-Body Association" is accepted by ICLR 2024 @ Vienna
12-09-2023: Our two papers are accepted by AAAI 2024 @ Vancouver
08-10-2023: Our paper "Synergetic Assessment of Quality and Aesthetic: Approach and Comprehensive Benchmark Dataset" is accepted by TCSVT
06-22-2023: Our demo "Real-time 3D Hand Gestures for Medical Image Visualization" is presented on CVPR 2023 @ Vancouver
11-14-2022: Our paper "RIVIE: Robust Inherent Video Information Embedding" is accepted by TMM
09-11-2022: Our paper "Learning Continuous Mesh Representation with Spherical Implicit Surface" is accepted by FG 2023 @ Waikoloa, Hawaii
03-01-2022: Our paper "Learning Invisible Markers for Hidden Codes in Offline-to-online Photography" is accepted by CVPR 2022 @ New Orleans
02-09-2022: Our paper "Robust mesh representation learning via efficient local structure-aware anisotropic convolution" is accepted by TNNLS
Honor & Awards
2020 Best Paper Award on CVPR Workshop (Dynavis)
2019 China Initiative Postdocs Fellowship ("博士后创新人才支持计划")
2019 Shanghai Super Postdocs Fellowship (“超级博士后激励计划")
2019 National Youth Fund by NSFC
DEMO
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering (ICLR 2025)
6DGS significantly outperforms 3DGS and N-DG, achieving up to a 15.73 dB improvement in PSNR with a reduction of 66.5% Gaussian points compared to 3DGS.
Project: https://gaozhongpai.github.io/6dgs/
Real-time 3D Hand Gestures for Medical Image Visualization (CVPR 2023 Demo)
This innovative approach can transform medical imaging, enhancing diagnostic accuracy, treatment planning, and medical education.
Real-time 3D hand reconstruction from RGB camera on mobile phones in AR environments.
Support for left/right and multiple hands. Here only visualize 3D landmarks
Dynamic Facial Avatar with Implicit Neural Fields
High fidelity, including eye movements
Semi-supervised 3d face representation learning from unconstrained photo collections (CVPRW 2020, Best paper award, PDF)
The identity shapes are consistent over the frames with different lighting, poses, and expression conditions.
PUBLICATIONS
Zhongpai Gao, "Learning Continuous Mesh Representation with Spherical Implicit Surface", FG 2023, Code, PDF, IEEE
For meshes with fixed topology, we learn spherical implicit surface (SIS), which takes a spherical coordinate and the local vertex features around the coordinate or the global feature of the 3D shape as inputs, and predicts the 3D position at a given coordinate as an output. Since the spherical coordinates are continuous, SIS can present a mesh in an arbitrary resolution.
Jun Jia*, Zhongpai Gao*, Dandan Zhu, Xiongkuo Min, Guangtao Zhai, and Xiaokang Yang, "Learning Invisible Markers for Hidden Codes in Offline-to-online Photography", in IEEE Computer Vision and Pattern Recognition, CVPR 2022, PDF
This paper proposes a novel invisible information hiding architecture for display/print-camera scenarios, consisting of hiding, locating, correcting, and recovery, where invisible markers are learned to make hidden codes truly invisible.
Yunhao Li, Wei Shen, Zhongpai Gao, Yucheng Zhu, Guangtao Zhai, Guodong Guo "Looking Here or There? Gaze Following in 360-Degree Images", Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), PDF
We propose a 3D sight line guided dual-pathway framework, to detect the gaze target within a local region (here) and from a distant region (there), parallelly. Specifically, the local region is obtained as a 2D cone-shaped field along the 2D projection of the sight line starting at the human subject's head position, and the distant region is obtained by searching along the sight line in 3D sphere space.
Zhongpai Gao, Junchi Yan, Guangtao Zhai, and Xiaokang Yang. "Learning Spectral Dictionary for Local Representation of Mesh", Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI 2021), PDF
We learn spectral dictionary (i.e., bases) for the weighting matrices such that the parameter size is independent of the resolution of 3D shapes. The coefficients of the weighting matrix bases for each vertex are learned from the spectral features of the template's vertex and its neighbors in a weight-sharing manner.
Zhongpai Gao, Junchi Yan, Guangtao Zhai, Juyong Zhang, Yiyan Yang, and Xiaokang Yang. "Learning local neighboring structure for robust 3D shape representation", Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2021), PDF
We propose a local structure-aware anisotropic convolutional operation (LSA-Conv) that learns adaptive weighting matrices for each node according to the local neighboring structure and performs shared anisotropic filters. In fact, the learnable weighting matrix is similar to the attention matrix in the random synthesizer.
Zhongpai Gao, Juyong Zhang, Yudong Guo, Chao Ma, Guangtao Zhai, Xiaokang Yang. "Semi-supervised 3d face representation learning from unconstrained photo collections", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2020, Best paper award), PDF
We train our model in a semi-supervised manner with adversarial loss to exploit large amounts of unconstrained facial images. A novel center loss is introduced to make sure that facial images from the same subject have the same identity shape and albedo. Besides, our proposed model disentangles identity, expression, pose, and lighting representations, which improves the overall reconstruction performance and facilitates facial editing applications, eg, expression transfer.
Jun Jia, Zhongpai Gao, Kang Chen, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Xiaokang Yang. "RIHOOP: Robust Invisible Hyperlinks in Offline and Online Photographs", IEEE Transactions on Cybernetics (IEEE TCyb 2020), PDF
Our approach is an end-to-end neural network with an encoder to hide messages and a decoder to extract messages. To maintain the hidden message resilient to cameras, we build a distortion network between the encoder and the decoder to augment the encoded images. The distortion network uses differentiable 3-D rendering operations, which can simulate the distortion introduced by camera imaging in both printing and display scenarios.
PRE-PRINT
Zhongpai Gao, Junchi Yan, Guangtao Zhai, and Xiaokang Yang. "Permutation Matters: Anisotropic Convolutional Layer for Learning on Point Clouds", arXiv preprint arXiv:2005.13135
We propose a permutable anisotropic convolutional operation (PAI-Conv) that calculates soft-permutation matrices for each point using dot-product attention according to a set of evenly distributed kernel points on a sphere's surface and performs shared anisotropic filters. In fact, dot product with kernel points is by analogy with the dot-product with keys in Transformer