Kihyuk Sohn

I am a Research Scientist at Google Research in Mountain View, CA. Prior to joining Google, I was a researcher in Media Analytics group of NEC Laboratories America. I completed my Ph.D. at University of Michigan under the supervision by professor Honglak Lee. I have broad interest in machine learning and computer vision. Specifically, my research focuses on supervised and unsupervised deep representation learning with applications to computer vision, audio recognition, and text processing, using graphical models that are invariant to many factors of variation for robust perception from complex and multimodal data.

NEWS

[2/17/24] Two papers (MAGVIT-v2, DreamFlow) accepted at ICLR 2024.

[9/21/23] Two papers (StyleDrop, Collaborative Score Distillation) accepted at NeurIPS 2023.

[8/17/23] Hiring! We are looking to hire a student researcher on generative model research. Please reach out to me if you are interested.

Experience

Curriculum Vitae [pdf (outdated)][google scholar]

November 2023 ~ : Staff Research Scientist, Google Research

March 2022 ~ : Research Scientist, Google Research

July 2019 ~ March 2022 : Research Scientist, Google Cloud AI

July 2015 ~ July 2019 : Researcher, NEC Laboratories America

Education

September 2008 ~ June 2015

    Ph.D. in Electrical Engineering: Systems, University of Michigan, Ann Arbor

    Thesis advisor : Professor Honglak Lee

March 2003 ~ February 2008

    Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea

    B.S. in Electrical Engineering and Computer Science and Mathematical Science

Contact information

Email:

    kihyuks [at] google [dot] com

    kihyuk.sohn [at] gmail [dot] com

Publications

Preprint

[5] Direct Consistency Optimization for Compositional Text-to-Image Personalization

Kyungmin Lee, Sangkyung Kwak, Kihyuk Sohn, Jinwoo Shin [arxiv][project page][code]


[4] Unsupervised LLM Adaptation for Question Answering

Kuniaki Saito, Kihyuk Sohn, Chen-Yu Lee, Yoshitaka Ushiku [arxiv]


[3] VideoPoet: A Large Language Model for Zero-shot Video Generation

Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Rachel Hornung, Hartwig Adam, Hassan Akbari, Yair Alon, Vighnesh Birodkar, Yong Cheng, Ming-Chang Chiu, Josh Dillon, Irfan Essa, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, David Ross, Grant Schindler, Mikhail Sirotenko, Kihyuk Sohn, Krishna Somandepalli, Huisheng Wang, Jimmy Yan, Ming-Hsuan Yang, Xuan Yang, Bryan Seybold, Lu Jiang [arxiv][project page][blog]


[2] Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, Jose Lezama [arxiv][project page]


[1] Improve Supervised Representation Learning with Masked Image Modeling

Kaifeng Chen, Daniel Salz, Huiwen Chang, Kihyuk Sohn, Dilip Krishnan, Mojtaba Seyedhosseini [arxiv]

2024

[56] Language Model Beats Diffusion - Tokenizer is Key to Visual Generation

Lijun Yu, Jose Lezama, Nitesh Bharadwaj Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A Ross, Lu Jiang [openreview]

To appear at International Conference on Learning Representations (ICLR), 2024.


[55] DreamFlow: High-quality Text-to-3D Generation by Approximating Probability Flow

Kyungmin Lee, Kihyuk Sohn, Jinwoo Shin [openreview]

To appear at International Conference on Learning Representations (ICLR), 2024 (spotlight).

2023

[55] StyleDrop: Text-to-Image Generation in Any Style

Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan [arXiv][project page][blog post][video]

In Advances in Neural Information Processing Systems (NeurIPS), 2023.


[54] Collaborative Score Distillation for Consistent Visual Synthesis

Subin Kim*, Kyungmin Lee*, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin (* indicates equal contribution) [arXiv][project page]

In Advances in Neural Information Processing Systems (NeurIPS), 2023.

(A previous version was presented at ICML Workshop on Structured Probabilistic Inference and Generative Modeling, 2023.)


[53] FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction

Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolai Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister [arXiv]

In Association for Computational Linguistics (ACL), 2023


[52] Learning Disentangled Prompts for Compositional Image Synthesis

Kihyuk Sohn, Albert Shaw, Yuan Hao, Han Zhang, Luisa Polania, Huiwen Chang, Lu Jiang, Irfan Essa [arXiv]


[51] MAGVIT: Masked Generative Video Transformer

Lijun Yu, Yong Cheng, Kihyuk Sohn, Jose Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang [arXiv][project page][code]

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023 (selected as highlight)


[50] Video Probabilistic Diffusion Models in Projected Latent Space

Sihyun Yu, Kihyuk Sohn, Subin Kim, Jinwoo Shin [arXiv][project page][code]

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023


[49] MaskSketch: Unpaired Structure-guided Masked Image Generation

Dina Bashkirova, Jose Lezama, Kihyuk Sohn, Kate Saenko, Irfan Essa [arXiv][project page][code]

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023 (selected as highlight)


[48] Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister [arXiv][code]

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023


[47] Visual Prompt Tuning for Generative Transfer Learning 

Kihyuk Sohn, Huiwen Chang, Jose Lezama, Luisa Polania, Han Zhang, Yuan Hao, Irfan Essa, Lu Jiang [arXiv][code]

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023


[46] Prefix Conditioning Unifies Language and Label Supervision

Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister [arXiv][blog post]

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023


[45] Unifying Distribution Alignment as a Loss for Imbalanced Semi-Supervised Learning

Justin Lazarow, Kihyuk Sohn, Chen-Yu Lee, Chun-Liang Li, Zizhao Zhang, Tomas Pfister [pdf]

In IEEE Winter Conference on Applications of Computer Vision (WACV), 2023.


[44] Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types

Kihyuk Sohn, Jinsung Yoon, Chun-Liang Li, Chen-Yu Lee, Tomas Pfister [pdf]

In IEEE Winter Conference on Applications of Computer Vision (WACV), 2023.


[43] SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch

Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Tomas Pfister

In Transactions on Machine Learning Research (TMLR), 2023 [paper][blog post]

2022

[42] Self-supervised, Refine, Repeat: Improving Unsupervised Anomaly Detection

Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Chen-Yu Lee, Tomas Pfister

In Transactions on Machine Learning Research (TMLR), 2022 [paper]


[41] AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

David Berthelot*, Rebecca Roelofs*, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin (* indicates equal contribution) 

In International Conference on Learning Representations (ICLR), 2022 [arXiv][code]

2021

[40] Object-aware Contrastive Learning for Debiased Scene Representation

Sangwoo Mo*, Hyunwoo Kang*, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin (* indicates equal contribution) [arXiv][code]

In Advances in Neural Information Processing Systems (NeurIPS), 2021.


[39] Controlling Neural Networks with Rule Representations

Sungyong Seo, Sercan O Arik, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister [arXiv]

In Advances in Neural Information Processing Systems (NeurIPS), 2021.


[38] CutPaste: Self-Supervised Learning for Anomaly Detection and Localization

Chun-Liang Li*, Kihyuk Sohn*, Jinsung Yoon, Tomas Pfister (* indicates equal contribution)

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [pdf][arXiv]


[37] CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille, Fan Yang

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [pdf][arXiv][code]


[36] Learning and Evaluating Representations for Deep One-class Classification

Kihyuk Sohn*, Chun-Liang Li*, Jinsung Yoon, Minho Jin, Tomas Pfister (* indicates equal contribution)

In International Conference on Learning Representations (ICLR), 2021 [OpenReview][arXiv][code]


[35] i-Mix: A Strategy for Regularizing Contrastive Representation Learning.

Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin, Honglak Lee

In International Conference on Learning Representations (ICLR), 2021 [OpenReview][arXiv][code]

(A previous version was presented at Advances in Neural Information Processing Systems (NeurIPS) Self-Supervised Learning - Theory and Practice Workshop, 2020.)

2020

[34] A Simple Semi-Supervised Learning Framework for Object Detection

Kihyuk Sohn*, Zizhao Zhang*, Chun-Liang Li, Han Zhang, Chen-Yu Lee, Tomas Pfister (* indicates equal contribution) [arXiv][code]


[33] FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence.

Kihyuk Sohn*, David Berthelot*, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel (* indicates equal contribution)

In Advances in Neural Information Processing Systems (NeurIPS), 2020. [arXiv][code]


[32] Assessing Post-Disaster Damage from Satellite Imagery using Semi-Supervised Learning Techniques.

Jihyeon Lee, Joseph Z. Xu, Kihyuk Sohn, Wenhan Lu, David Berthelot, Izzeddin Gur, Pranav Khaitan, Ke-Wei (Fiona) Huang, Kyriacos Koupparis, Bernhard Kowatsch

In Advances in Neural Information Processing Systems (NeurIPS) AI + Humanitarian Assistance and Disaster Response Workshop, 2020. [arXiv]


[31] Improving Face Recognition by Clustering Unlabeled Faces in the Wild.

Aruni RoyChowdhury, Xiang Yu, Kihyuk Sohn, Erik Learned-Miller, Manmohan Chandraker

In European Conference on Computer Vision (ECCV), 2020. [arXiv]


[30] Adaptation Across Extreme Variations using Unlabeled Bridges.

Shuyang Dai, Kihyuk Sohn, Yi-Hsuan Tsai, Lawrence Carin, Manmohan Chandraker

In British Machine Vision Conference (BMVC), 2020. [pdf]


[29] Towards Universal Representation Learning for Deep Face Recognition.

Yichun Shi, Xiang Yu, Kihyuk Sohn, Manmohan Chandraker, Anil K. Jain

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [pdf]


[28] ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring.

David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel 

In International Conference on Learning Representations (ICLR), 2020 [OpenReview][arXiv][code]


[27] Active Adversarial Domain Adaptation.

Jong-Chyi Su, Yi-Hsuan Tsai, Kihyuk Sohn, Buyu Liu, Subhransu Maji, Manmohan Chandraker

In IEEE Winter Conference on Applications of Computer Vision (WACV), 2020. [arXiv]


[26] Adversarial Learning of Privacy-Preserving and Task-Oriented Representations.

Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, and Ming-Hsuan Yang

In Association for the Advancement of Artificial Intelligence (AAAI), 2020. [pdf coming soon][arXiv]

2019

[25] Domain Adaptation for Structured Output via Discriminative Patch Representations.

Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter and Manmohan Chandraker

In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019 (oral presentation). [pdf][supp][arXiv]


[24] Gotta Adapt ’Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild.

Luan Tran, Kihyuk Sohn, Xiang Yu, Xiaoming Liu and Manmohan Chandraker

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [pdf][supp][arXiv]


[23] Feature Transfer Learning for Face Recognition with Under-Represented Data.

Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu and Manmohan Chandraker

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [pdf][supp][arXiv]


[22] Unsupervised Domain Adaptation for Distance Metric Learning.

Kihyuk Sohn, Wenling Shang, Xiang Yu and Manmohan Chandraker

In International Conference on Learning Representations (ICLR), 2019. [pdf]


[21] Attentive Conditional Channel-Recurrent Autoencoding for Attribute-Conditioned Face Synthesis.

Wenling Shang and Kihyuk Sohn

In Winter Conference on Applications of Computer Vision (WACV), 2019. [pdf][code]

2018

[20] Learning to Adapt Structured Output Space for Semantic Segmentation.

Yi-Hsuan Tsai*, Wei-Chih Hung*, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang and Manmohan Chandraker 

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (spotlight) (* indicates equal contribution). [pdf][project page]


[19] Channel-Recurrent Autoencoding for Image Modeling.

Wenling Shang, Kihyuk Sohn, Yuandong Tian

In Winter Conference on Applications of Computer Vision (WACV), 2018. [pdf][arXiv][code]

2017

[18] Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos.

Kihyuk Sohn, Sifei Liu, Guangyu Zhong, Xiang Yu, Ming-Hsuan Yang, Manmohan Chandraker

In International Conference on Computer Vision (ICCV), 2017. [pdf][arXiv]


[17] Towards Large-Pose Face Frontalization.

Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker

In International Conference on Computer Vision (ICCV), 2017. [pdf][arXiv]


[16] Feature Reconstruction Disentangling for Pose-invariant Face Recognition.

Xi Peng, Xiang Yu, Kihyuk Sohn, Dimitris N. Metaxas, Manmohan Chandraker

In International Conference on Computer Vision (ICCV), 2017. [pdf][arXiv]


[15] Exploring Normalization in Deep Residual Networks with Concatenated Rectified Linear Units.

Wenling Shang, Justin Chiu, Kihyuk Sohn

In Association for the Advancement of Artificial Intelligence (AAAI), 2017. [pdf]

2016

[14] Improved Deep Metric Learning with Multi-class N-pair Loss Objective.

Kihyuk Sohn

In Advances in Neural Information Processing Systems (NIPS), 2016. [pdf][bib]


[13] Attribute2Image: Conditional Image Generation from Visual Attributes.

Xinchen Yan, Jimei Yang, Kihyuk Sohn, Honglak Lee

In European Conference on Computer Vision (ECCV), 2016. [pdf][arXiv][code]


[12] Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units.

Wenling Shang, Kihyuk Sohn, Diogo Almeida, Honglak Lee

In Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016. [pdf][arXiv]


[11] Discriminative Training of Structured Dictionaries via Block Orthogonal Matching Pursuit.

Wenling Shang, Kihyuk Sohn, Honglak Lee, Anna Gilbert

In SIAM International Conference on Data Mining (SDM), 2016 [pdf]

2015 and earlier

[10] Learning Structured Output Representation using Deep Conditional Generative Models.

Kihyuk Sohn, Xinchen Yan and Honglak Lee.

In Advances in Neural Information Processing Systems (NIPS), 2015 [pdf][supp][bib][code]


[9] Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction.

Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan and Honglak Lee

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015 (oral presentation). [pdf][supp][tech report][code]

OpenCV People’s Vote Winning Paper [link]


[8] Improved Multimodal Deep Learning with Variation of Information.

Kihyuk Sohn, Wenling Shang and Honglak Lee

In Advances in Neural Information Processing Systems (NIPS), 2014 [pdf][pdf (full)][bib][github]


[7] Learning to Disentangle Factors of Variation with Manifold Interaction.

Scott Reed, Kihyuk Sohn, Yuting Zhang and Honglak Lee

In Proceedings of the 31st International Conference on Machine Learning (ICML), 2014. [pdf][bib][code]


[6] Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling.

Kihyuk Sohn*, Andrew Kae*, Honglak Lee and Erik Learned-Miller.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [pdf][bib][project page][code] (* indicates equal contribution.)


[5] Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines.

Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, and Honglak Lee.

In Proceedings of the 30th International Conference on Machine Learning (ICML), 2013. [pdf][bib][supp][project page][code]

(A previous version was presented at the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2012.)


[4] Learning Invariant Representations with Local Transformations.

Kihyuk Sohn and Honglak Lee.

In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012. [pdf][bib][github]


[3] Online Incremental Feature Learning with Denoising Autoencoders.

Guanyu Zhou, Kihyuk Sohn, and Honglak Lee.

In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W&CP 22, 2012. [pdf][bib][supp] (oral presentation)

(A previous version was presented at the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.)


[2] An interpretation of the Cover and Leung capacity region for the MAC with feedback through stochastic control.

Achilleas Anastasopoulos and Kihyuk Sohn.

In Proceedings of IEEE International Conference on Communications (ICC), 2012. [pdf][bib]


[1] Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition.

Kihyuk Sohn, Dae Yon Jung, Honglak Lee, and Alfred Hero III.

In Proceedings of 13th International Conference on Computer Vision (ICCV), 2011. [pdf][bib]

Software