Advanced Applied Deep Learning

Practice Course

Sheng Yun Wu

Week 5: Advanced Optimization Techniques

In Week 5, students will learn and implement advanced optimization techniques that are critical for improving the performance and training efficiency of deep learning models. This includes optimizers like Adam, RMSprop, and SGD with momentum, as well as techniques like learning rate schedules and gradient clipping. These methods are essential for achieving better convergence and faster training, especially in deep neural networks.

Example 1: Using Stochastic Gradient Descent (SGD) Optimizer

Description:
This example introduces the basic SGD optimizer and demonstrates how to use it to train a simple model.

import tensorflow as tf

from tensorflow.keras import models, layers

from tensorflow.keras.optimizers import SGD

# Load MNIST dataset

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Reshape and normalize the data

train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255

# Display shape of the dataset

print(f"Train images shape: {train_images.shape}")

print(f"Test images shape: {test_images.shape}")

# Build a simple model

model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, activation='relu'),

layers.Dense(10, activation='softmax')

])

# Compile the model using SGD optimizer

model.compile(optimizer=SGD(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Example 2: Using SGD with Momentum

Description:
Momentum helps accelerate SGD in the relevant direction and dampens oscillations. This example shows how to implement SGD with momentum.

# Compile the model using SGD with momentum

model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.9), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Example 3: Using RMSprop Optimizer

Description:
RMSprop is designed to adapt the learning rate for each parameter. It is especially useful in models where the learning rate needs to change frequently.

from tensorflow.keras.optimizers import RMSprop

# Compile the model using RMSprop optimizer

model.compile(optimizer=RMSprop(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Example 4: Using Adam Optimizer

Description:
Adam is one of the most popular optimization algorithms. This example demonstrates how to use Adam, which combines the advantages of both RMSprop and SGD with momentum.

from tensorflow.keras.optimizers import Adam

# Compile the model using Adam optimizer

model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Example 5: Implementing Learning Rate Scheduling

Description:
Learning rate scheduling dynamically adjusts the learning rate during training to speed up convergence. This example introduces a simple learning rate schedule.

from tensorflow.keras.callbacks import LearningRateScheduler

# Define a learning rate schedule

def lr_schedule(epoch):

lr = 0.001

if epoch > 10:

lr *= 0.1

return lr

# Add learning rate scheduler callback

lr_scheduler = LearningRateScheduler(lr_schedule)

# Compile and train the model with learning rate scheduler

model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=15, validation_data=(test_images, test_labels), callbacks=[lr_scheduler])

Example 6: Using Exponential Decay for Learning Rate

Description:
This example demonstrates how to use an exponentially decaying learning rate to gradually reduce the learning rate during training.

from tensorflow.keras.optimizers.schedules import ExponentialDecay

# Define an exponential decay learning rate schedule

lr_schedule = ExponentialDecay(initial_learning_rate=0.001, decay_steps=100000, decay_rate=0.96)

# Compile the model with the learning rate schedule

model.compile(optimizer=Adam(learning_rate=lr_schedule), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(train_images, train_labels, epochs=15, validation_data=(test_images, test_labels))

Example 7: Using Gradient Clipping

Description:
Gradient clipping helps prevent the exploding gradient problem by limiting the magnitude of the gradients. This example shows how to apply gradient clipping to the Adam optimizer.

# Compile the model with gradient clipping

model.compile(optimizer=Adam(clipvalue=1.0), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Example 8: Cyclical Learning Rate (CLR)

Description:
This example introduces cyclical learning rates (CLR), where the learning rate fluctuates between a lower and upper bound throughout training, often leading to faster convergence.

from tensorflow.keras.callbacks import Callback

import numpy as np

# Define a cyclical learning rate callback

class CyclicLR(Callback):

def __init__(self, base_lr=0.001, max_lr=0.006, step_size=2000., mode='triangular'):

self.base_lr = base_lr

self.max_lr = max_lr

self.step_size = step_size

self.mode = mode

self.lr = base_lr

super(CyclicLR, self).__init__()

def on_batch_end(self, batch, logs=None):

cycle = np.floor(1 + batch / (2 * self.step_size))

x = np.abs(batch / self.step_size - 2 * cycle + 1)

if self.mode == 'triangular':

self.lr = self.base_lr + (self.max_lr - self.base_lr) * np.maximum(0, (1 - x))

# Apply the new learning rate

tf.keras.backend.set_value(self.model.optimizer.lr, self.lr)

# Compile the model with Adam optimizer

model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model with Cyclical Learning Rate

clr = CyclicLR(base_lr=0.001, max_lr=0.006, step_size=2000)

model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels), callbacks=[clr])

Example 9: Using Nesterov Accelerated Gradient (NAG)

Description:
This example explains how to use Nesterov Accelerated Gradient (NAG), a variant of SGD with momentum, which looks ahead at the gradient in the direction of momentum.

# Compile the model using Nesterov Accelerated Gradient (NAG)

model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.9, nesterov=True), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Example 10: Comparing Optimizers and Learning Rates

Description:
This example compares different optimizers (SGD, Adam, RMSprop) and learning rate schedules to help students understand which optimizer works best for different scenarios.

# Compile models with different optimizers

optimizers = {'SGD': SGD(), 'Adam': Adam(), 'RMSprop': RMSprop()}

# Train each model and compare performance

for opt_name, opt in optimizers.items():

print(f"Training with {opt_name} optimizer:")

model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Week 5 Summary

Objective: Learn how to optimize training through the use of different optimizers, learning rate schedules, and techniques like gradient clipping and cyclical learning rates.
Skills Developed:
- Understand and implement advanced optimizers (SGD with momentum, RMSprop, Adam).
- Learn about learning rate schedules, such as step decay, exponential decay, and cyclical learning rates.
- Apply gradient clipping to prevent exploding gradients.
Tools: TensorFlow, Keras, Optimizers (SGD, RMSprop, Adam).

These 10 examples in Week 5 provide students with hands-on experience in using advanced optimization techniques to improve model training. By experimenting with different optimizers, learning rates, and schedules, students can observe how each method influences the convergence speed and performance of their models.

Page updated

Report abuse