Mingi Kwon - Papers

Papers

❤️ : First / co-first author 🧡 : My favorite co-working papers!

❤️ [2024] "Harnessing Text-to-Image Models for Video Generation"

[ECCV2024]

This is a paper on a general text-to-video model. We propose a method where the image model is completely frozen while training a motion module. The paper is packed with valuable information, including regularization between self-attention maps across frames, a mapping network that changes the distribution of input noise, a frame-wise token generator that allows the use of different tokens for each frame, and a video sampling method called MG sampling. Stay tuned as we will be archiving it soon!

🧡 [2024] "Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models"

project page / arxiv

This paper successfully extracts motion information from a single reference video. It proposes a method where Spatial LoRA is trained first, followed by training Temporal LoRA to extract only the motion information. I had a great time working with Yixuan Ren on the initial idea construction for this paper!

❤️ [2024] "Attribute Based Interpretable Evaluation Metrics for Generative Models"

[ICML2024] project page / arxiv / code

This paper proposes a new method for evaluating image generation models. It measures the CLIP Score for each attribute and evaluates the generation models based on the distribution differences of these scores. This approach allows us to understand which aspects a model excels at and which aspects it struggles with. Moreover, it is the first evaluation metric to consider the relationship between two attributes. In fact, we propose HCS, which is even better than the CLIP Score. Curious to know more? Read the paper!

🧡 [2024] "Plug-and-Play Diffusion Distillation"

[CVPR2024] project page / arxiv / video

This paper demonstrates the possibility of Diffusion distillation using a fully frozen Text-to-Image model with additional training of external models. With this training method, it is possible to generate images nearly identical to the original performance in just 8 steps, without the need for classifier-free guidance. Moreover, it can be applied to models fine-tuned for different domains! This is one of my favorite papers! I had a great time working with Yi-Ting (Tiffany) Hsiao :)

❤️ [2024] "Training-free Style Transfer Emerges from h-space in Diffusion models"

[WACV2024] project page / arxiv / code

This paper demonstrates that the bottleneck, or deepest feature map, of a U-Net can be replaced to inject content. It’s a bit unfortunate that some of the algorithms used here haven’t received much attention. Particularly, the algorithm named Style Calibration has the potential to be widely used in all kinds of image editing tasks. If you’re interested, it’s worth taking a closer look.

❤️ [2023] "Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry"

[Neurips2023] arxiv / code

This paper analyzes the basis of diffusion models and leverages this understanding to reveal various characteristics. While image editing is possible, it is not the primary focus of the paper. Instead, it thoroughly explores the properties of the latent space in diffusion models and the manifolds they form. I prefer papers that uncover and explain the underlying principles rather than just demonstrating that something works. This is why this paper is one of my absolute favorites. So, what are the characteristics? It’s hard to summarize briefly. How about reading the abstract to find out?

❤️ [2023] "Unsupervised Discovery of Semantic Latent Directions in Diffusion Models"

arxiv

It’s an excellent paper. Why was it only archived on arXiv, you ask? Unfortunately, we received a desk rejection due to a mistake on our part that compromised the anonymity during submission. We took about 30% of this paper and developed the “Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry” paper from it. This means that about 70% of the content is entirely different. Have you read that paper already? If you enjoyed it, you’ll likely find this one quite interesting as well. I highly recommend it!

❤️ [2023] "Diffusion Models already have a Semantic Latent Space"

[CLR2023 as a notable-top-25%] project page / arxiv / code

This paper is a very interesting read. I still feel the endless potential of Asyrp. I believe it is the first paper to demonstrate the linear properties in diffusion models. Recently, a paper called CFG++ was published, and I think that applying Asyrp’s algorithm to Text-to-Image results in something like CFG++. :)

🧡 [2022] "FurryGAN: High Quality Foreground-aware Image Synthesis"

[ECCV2022] project page / arxiv / code

Isn’t it amazing to generate an image along with an alpha mask? This can be done even without a paired dataset! It would be incredibly fun if someone could create this with Diffusion models!