Advanced Applied Deep Learning
Practice Course
Sheng Yun Wu
Practice Course
Sheng Yun Wu
In Week 5, students will learn and implement advanced optimization techniques that are critical for improving the performance and training efficiency of deep learning models. This includes optimizers like Adam, RMSprop, and SGD with momentum, as well as techniques like learning rate schedules and gradient clipping. These methods are essential for achieving better convergence and faster training, especially in deep neural networks.
Description:
This example introduces the basic SGD optimizer and demonstrates how to use it to train a simple model.
import tensorflow as tf
from tensorflow.keras import models, layers
from tensorflow.keras.optimizers import SGD
# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
# Reshape and normalize the data
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
# Display shape of the dataset
print(f"Train images shape: {train_images.shape}")
print(f"Test images shape: {test_images.shape}")
# Build a simple model
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile the model using SGD optimizer
model.compile(optimizer=SGD(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
Momentum helps accelerate SGD in the relevant direction and dampens oscillations. This example shows how to implement SGD with momentum.
# Compile the model using SGD with momentum
model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.9), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
RMSprop is designed to adapt the learning rate for each parameter. It is especially useful in models where the learning rate needs to change frequently.
from tensorflow.keras.optimizers import RMSprop
# Compile the model using RMSprop optimizer
model.compile(optimizer=RMSprop(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
Adam is one of the most popular optimization algorithms. This example demonstrates how to use Adam, which combines the advantages of both RMSprop and SGD with momentum.
from tensorflow.keras.optimizers import Adam
# Compile the model using Adam optimizer
model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
Learning rate scheduling dynamically adjusts the learning rate during training to speed up convergence. This example introduces a simple learning rate schedule.
from tensorflow.keras.callbacks import LearningRateScheduler
# Define a learning rate schedule
def lr_schedule(epoch):
lr = 0.001
if epoch > 10:
lr *= 0.1
return lr
# Add learning rate scheduler callback
lr_scheduler = LearningRateScheduler(lr_schedule)
# Compile and train the model with learning rate scheduler
model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=15, validation_data=(test_images, test_labels), callbacks=[lr_scheduler])
Description:
This example demonstrates how to use an exponentially decaying learning rate to gradually reduce the learning rate during training.
from tensorflow.keras.optimizers.schedules import ExponentialDecay
# Define an exponential decay learning rate schedule
lr_schedule = ExponentialDecay(initial_learning_rate=0.001, decay_steps=100000, decay_rate=0.96)
# Compile the model with the learning rate schedule
model.compile(optimizer=Adam(learning_rate=lr_schedule), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=15, validation_data=(test_images, test_labels))
Description:
Gradient clipping helps prevent the exploding gradient problem by limiting the magnitude of the gradients. This example shows how to apply gradient clipping to the Adam optimizer.
# Compile the model with gradient clipping
model.compile(optimizer=Adam(clipvalue=1.0), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
This example introduces cyclical learning rates (CLR), where the learning rate fluctuates between a lower and upper bound throughout training, often leading to faster convergence.
from tensorflow.keras.callbacks import Callback
import numpy as np
# Define a cyclical learning rate callback
class CyclicLR(Callback):
def __init__(self, base_lr=0.001, max_lr=0.006, step_size=2000., mode='triangular'):
self.base_lr = base_lr
self.max_lr = max_lr
self.step_size = step_size
self.mode = mode
self.lr = base_lr
super(CyclicLR, self).__init__()
def on_batch_end(self, batch, logs=None):
cycle = np.floor(1 + batch / (2 * self.step_size))
x = np.abs(batch / self.step_size - 2 * cycle + 1)
if self.mode == 'triangular':
self.lr = self.base_lr + (self.max_lr - self.base_lr) * np.maximum(0, (1 - x))
# Apply the new learning rate
tf.keras.backend.set_value(self.model.optimizer.lr, self.lr)
# Compile the model with Adam optimizer
model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model with Cyclical Learning Rate
clr = CyclicLR(base_lr=0.001, max_lr=0.006, step_size=2000)
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels), callbacks=[clr])
Description:
This example explains how to use Nesterov Accelerated Gradient (NAG), a variant of SGD with momentum, which looks ahead at the gradient in the direction of momentum.
# Compile the model using Nesterov Accelerated Gradient (NAG)
model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.9, nesterov=True), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
This example compares different optimizers (SGD, Adam, RMSprop) and learning rate schedules to help students understand which optimizer works best for different scenarios.
# Compile models with different optimizers
optimizers = {'SGD': SGD(), 'Adam': Adam(), 'RMSprop': RMSprop()}
# Train each model and compare performance
for opt_name, opt in optimizers.items():
print(f"Training with {opt_name} optimizer:")
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Objective: Learn how to optimize training through the use of different optimizers, learning rate schedules, and techniques like gradient clipping and cyclical learning rates.
Skills Developed:
Understand and implement advanced optimizers (SGD with momentum, RMSprop, Adam).
Learn about learning rate schedules, such as step decay, exponential decay, and cyclical learning rates.
Apply gradient clipping to prevent exploding gradients.
Tools: TensorFlow, Keras, Optimizers (SGD, RMSprop, Adam).
These 10 examples in Week 5 provide students with hands-on experience in using advanced optimization techniques to improve model training. By experimenting with different optimizers, learning rates, and schedules, students can observe how each method influences the convergence speed and performance of their models.