Han Zhang

Research Scientist at Google Deepmind

Email:  zhanghan [at] google [dot] com (work)

    hanzhang.ai [at] gmail [dot] com (other)

About

I am currently a Research Scientist at Google Deepmind. I obtained my Ph.D. in Computer Science at Rutgers University in 2018, supervised by Dimitris Metaxas

My research interests are computer vision, deep learning, and medical image analysis. My current research is focused on generative modeling, semi-supervised learning and vision-language interaction.

Most recently, I am working on the Veo Project!

Work Experience

Selected Publications

(* indicates equal contributions)

[ICLR'24] Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency [Spotlight]

Tianhong Li, Sangnie Bhardwaj, Yonglong Tian, Han Zhang, Jarred Barber, Dina Katabi, Guillaume Lajoie, Huiwen Chang, Dilip Krishnan [pdf]

[ICCV'23] VQ3D: Learning a 3D-Aware Generative Model on ImageNet. [Oral, Best paper finalist]

Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun. [pdf][project]

[ICCV'23] SVDiff: Compact Parameter Space for Diffusion Fine-Tuning.

Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, Feng Yang. [pdf][website]

[ICML'23] Muse: Text-To-Image Generation via Masked Generative Transformers [Muse]

Huiwen Chang*, Han Zhang*, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan

[CVPR'23] Visual prompt tuning for generative transfer learning

Kihyuk Sohn, Yuan Hao, José Lezama, Luisa Polania, Huiwen Chang, Han Zhang, Irfan Essa, Lu Jiang [pdf]

[CVPR'23] MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

Tianhong Li, Huiwen Chang, Shlok Kumar Mishra, Han Zhang, Dina Katabi, Dilip Krishnan. [pdf]

[CVPR'23] MAGVIT: Masked Generative Video Transformer

Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang

[pdf] [website]

[ICLR'23] Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions [Phenaki]

Ruben Villegas*, Mohammad Babaeizadeh*, Pieter-Jan Kindermans*, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, Dumitru Erhan [pdf]

[TMLR'22] Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [Parti]

Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu [pdf]

[ECCV'22] BLT: Bidirectional Layout Transformer for Controllable Layout Generation

Xiang Kong, Lu Jiang, Huiwen Chang, Han Zhang, Yuan Hao, Haifeng Gong, Irfan Essa [pdf]

[ECCV'22] DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister [pdf]

[ECCV'22] Learning Instance-Specific Adaptation for Cross-Domain Segmentation 

Yuliang Zou, Zizhao Zhang, Chun-Liang Li, Han Zhang, Tomas Pfister, Jia-Bin Huang [pdf]

[ECCV'22] MaxViT: Multi-Axis Vision Transformer

Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li [pdf][code]

[CVPR'22] MaskGIT: Masked Generative Image Transformer

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman [pdf][code]

[CVPR'22] MAXIM: Multi-Axis MLP for Image Processing [Oral; Best paper finalist]

Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li [pdf][code]

[CVPR'22] Learning to Prompt for Continual Learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister [pdf][code]

[ICLR'22] ViTGAN: Training GANs with Vision Transformers [Spotlight]

Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu. [pdf]

[ICLR’22] Vector-quantized Image Modeling with Improved VQGAN.

Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alex Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu. [pdf]

[AAAI'22] Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding [Oral]

Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O Arik, Tomas Pfister. [pdf][code]

[NeurIPS'21] Improved Transformer for High-Resolution GANs

Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang. [pdf][code]

[CVPR'21] Cross-Modal Contrastive Learning for Text-to-Image Generation.

Han Zhang*, Jing Yu Koh*, Jason Baldridge, Honglak Lee, Yinfei Yang. [pdf][code]

[ICLR'21] Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction.

Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee, Seunghoon Hong [project][pdf]

[ICLR'21] PseudoSeg: Designing Pseudo Labels for Semantic Segmentation.

Yuliang Zou, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister [code][pdf]

[AAAI'21] Improved Consistency Regularization for GANs.

Zhengli Zhao, Sameer Singh, Honglak Lee, Zizhao Zhang, Augustus Odena, Han Zhang [pdf]

[NeurIPS'20] FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence.

Kihyuk Sohn*, David Berthelot*, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel [code][pdf]

[ICML'20] Small-GAN: Speeding up GAN Training using Core-Sets. [pdf]

Samarth Sinha, Han Zhang , Anirudh Goyal , Yoshua Bengio , Hugo Larochelle , Augustus Odena, ICML 2020.

[CVPR'20] Distilling Effective Supervision from Severe Label Noise. [pdf]

Zizhao Zhang, Han Zhang, Sercan O Arik, Honglak Lee, Tomas Pfister, CVPR 2020.

[CVPR'20] Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models. [pdf]

Giannis Daras, Augustus Odena, Han Zhang, Alexandros G. Dimakis, CVPR 2020.

[ICLR'20] Consistency Regularization for Generative Adversarial Networks. [pdf]

Han Zhang, Zizhao Zhang,  Augustus Odena, Honglak Lee. ICLR, 2020. 

[ICLR'20] ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring. [pdf]

David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel. ICLR, 2020

[CVPR'19] Co-occurrent Features in Semantic Segmentation [pdf]

Hang Zhang, Han Zhang, Chenguang Wang, Junyuan Xie. CVPR, 2019. 

[ICML'19] Self-Attention Generative Adversarial Networks. [Oral (Long Talk)]

Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena. ICML, 2018. [pdf][code]

[ICLR'18] Improving GANs Using Optimal Transport. [pdf][code]

Tim Salimans*, Han Zhang*, Alec Radford, Dimitris Metaxas. ICLR, 2018. 

[CVPR'18] AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. [pdf][code]

Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. CVPR, 2018. 

[TPAMI'18] StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. [pdf] [code]

Han Zhang*, Tao Xu*, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. To appear in TPAMI, 2018. 

[Neuroinformatics'18] SegAN: Adversarial Network with Multi-scale L1 Loss for Medical Image Segmentation. [pdf][code]

Yuan Xue*, Tao Xu*, Han Zhang, L. Rodney Long and Xiaolei Huang. Neuroinformatics, 2018.

[ICCV'17] StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. [Oral]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. ICCV, 2017. [arxiv] [iccv] [code]

[CVPR'17] Link the head to the "peak'': Zero Shot Learning from Noisy Text descriptions at Part Precision. [pdf][code]

Mohamed Elhoseiny*, Yizhe Zhu*, Han Zhang, Ahmed Elgammal. CVPR 2017. 

[CVPR'16] SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-grained Recognition. [pdf]

Han Zhang*, Tao Xu*, Mohamed Elhoseiny, Xiaolei Huang, Shaoting Zhang, Ahmed Elgammal, and Dimitris Metaxas. CVPR, 2016.

[MICCAI'16] Multimodal Deep Learning for Cervical Dysplasia Diagnosis. [pdf]

Tao Xu*, Han Zhang*, Xiaolei Huang, Shaoting Zhang, and Dimitris Metaxas. MICCAI, 2016 (Early acceptance rate, ~10%). 

[PR'16] Multi-feature based Benchmark for Cervical Dysplasia Classification Evaluation. [pdf]

Tao Xu, Han Zhang, Cheng Xin, Edward Kim, L Rodney Long, Zhiyun Xue, Sameer Antani, and Xiaolei Huang. Pattern Recognition, 2016. 

[ISBI'14] Robust shape prior modeling based on Gaussian-Bernoulli Restricted Boltzmann Machine.[pdf]

Han Zhang, Shaoting Zhang, Kang Li and Dimitris Metaxas.  IEEE International Symposium on Biomedical Imaging, 2014.  Oral presentation