Learning Rate Scheduling in GAN-oriented Models

Applied reinforcement learning to the learning scheduling of generative models.
Implemented reinforcement learning algorithms such as Proximal Policy Optimization (PPO).

- Applied reinforcement learning to problem-solving outside of typical environments like games or Mujoco.
- Confirmed the potential of applying reinforcement learning to GAN models.

Introduction

- In model training, the learning rate determines the size of parameter updates and is a critical factor that influences performance.
- Adjusting the learning rates of the Generator and Discriminator is crucial, especially in GAN models, as it significantly impacts the generated results.

- Rule-based learning rate scheduling methods adjust the learning rate based on predefined rules but may not guarantee optimal learning rates in different training environments.
- This is particularly challenging in GAN models, where ambiguous performance metrics make learning rate adjustment even more difficult.

Develop a learning rate scheduling model that dynamically adjusts the learning rate of GAN models through reinforcement learning, maximizing training performance and improving the quality of generated images.
Verify that the reinforcement learning-based learning rate scheduling outperforms traditional rule-based methods in terms of training performance and convergence speed

A dataset consisting of images from 10 classes, used for training and evaluating the GAN model.

Unconditional GAN
- For initial experiments with basic learning rate scheduling techniques, a basic GAN structure that generates images without conditions is used.
DCGAN (Deep Convolutional GAN)
- A CNN-based GAN structure suitable for learning complex image patterns aims to improve performance through learning rate scheduling.
WGAN (Wasserstein GAN)
- A GAN model using Wasserstein Distance for stable training and improved image quality, where reinforcement learning-based learning rate scheduling was applied to verify performance.

Overview of State

- Inception Score (IS): Measures the sharpness and diversity of generated images using Pre-trained Inception-v3.
- Learned Perceptual Image Path Similarity (LPIPS): Measures the similarity of images using an initial image classification model.
- Independent Frechet Inception Distance (iFID): A variation of FID that measures image similarity using Pre-trained Inception-v3.

- Proximal Policy Optimization (PPO) Algorithm
- Used PPO algorithm to dynamically adjust the learning rate, optimizing model performance by finding the optimal learning rate during training.
- Actor & Critic: Used a 2-layer MLP

- Set 5,000 iterations of DCGAN as one episode.
- Set decision step k and episode number n to train the learning rate schedule.

PPO algorithm for LR scheduling

Overview of GAN-LR scheduler

Rule-based learning rate scheduling
- Constant LRS
- Step decay LRS
- Cosine annealing LRS

Proposed a reinforcement learning model for learning rate scheduling in GAN models.
Conducted experiments with various rewards and state models applicable to GANs.
Challenges in Image Generation Using the GAN-LR Scheduler:
- There is no limitation on the advantage, which may cause deviations from the existing policy and violate PPO assumptions.
- The hyperparameter settings for reinforcement learning may be insufficient compared to generative model tasks.
- Reinforcement learning focuses on the state before the generative model is sufficiently trained within an episode.
- When using the Inception score as a reward, the model may overly focus on diversity in the generated results.
Lack of Hyperparameter Exploration:
- Additional experiments are needed to explore hyperparameters through varying episode lengths and decision steps.
- Further exploration of more appropriate models is required through (various encoder models, reinforcement learning hyperparameter sets, etc ...)

Page updated

Google Sites

Report abuse