Advanced Applied Deep Learning
Practice Course
Sheng Yun Wu
Practice Course
Sheng Yun Wu
In Week 4, students focus on regularization techniques to prevent overfitting, improve generalization, and stabilize the training of deep learning models. The examples cover common methods like dropout, L2 and L1 regularization, data augmentation, and early stopping. These practices are essential when working with deep learning models on small or noisy datasets.
Description:
This example demonstrates the concepts of overfitting and underfitting using a small neural network. Students will observe how models that are too complex overfit the training data, while simpler models underfit.
import tensorflow as tf
from tensorflow.keras import models, layers
# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
# Reshape and normalize the data
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
# Display shape of the dataset
print(f"Train images shape: {train_images.shape}")
print(f"Test images shape: {test_images.shape}")
# Build a small underfitting model
underfit_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(16, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Build a larger overfitting model
overfit_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile the models
underfit_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
overfit_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the models
underfit_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
overfit_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
This example introduces the dropout technique, which helps prevent overfitting by randomly dropping units (neurons) during training.
# Build a model with dropout layers
dropout_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with dropout
dropout_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
dropout_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
This example shows how to apply L2 regularization, also known as ridge regularization, to penalize large weights and prevent overfitting.
# Build a model with L2 regularization
l2_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, kernel_regularizer='l2', activation='relu'),
layers.Dense(512, kernel_regularizer='l2', activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with L2 regularization
l2_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
l2_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
This example introduces L1 regularization, which encourages sparsity in the model's weights by penalizing the absolute value of weights.
# Build a model with L1 regularization
l1_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, kernel_regularizer='l1', activation='relu'),
layers.Dense(512, kernel_regularizer='l1', activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with L1 regularization
l1_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
l1_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
This example combines both dropout and L2 regularization in a single model to demonstrate how these regularization techniques work together to improve generalization.
# Build a model with both dropout and L2 regularization
dropout_l2_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, kernel_regularizer='l2', activation='relu'),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer='l2', activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with dropout and L2 regularization
dropout_l2_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
dropout_l2_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
Early stopping is a technique where the training process is stopped as soon as the validation performance stops improving, thus preventing overfitting.
from tensorflow.keras.callbacks import EarlyStopping
# Add early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)
# Build a simple model
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with early stopping
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=20, validation_data=(test_images, test_labels), callbacks=[early_stopping])
Description:
This example introduces data augmentation, a technique to artificially increase the diversity of the training dataset by applying random transformations to the images.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Define data augmentation generator
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)
# Train the model using augmented data
model.fit(datagen.flow(train_images, train_labels, batch_size=64), epochs=5, validation_data=(test_images, test_labels))
Description:
This example demonstrates how to use batch normalization to speed up training and reduce overfitting by normalizing activations during training.
# Build a model with batch normalization
batch_norm_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.BatchNormalization(),
layers.Dense(512, activation='relu'),
layers.BatchNormalization(),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with batch normalization
batch_norm_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
batch_norm_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Description:
Students use this example to visualize the learning curves (training and validation loss) to observe the effects of applying various regularization techniques, such as dropout, L2, and L1.
import matplotlib.pyplot as plt
# Train a model with dropout
history = dropout_model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))
# Plot training and validation loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss with Dropout')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Description:
This example introduces hyperparameter tuning for regularization techniques, such as adjusting dropout rates or L2 penalty strength, to find the best combination for the model.
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# Define a function to build the model with hyperparameters
def build_model(dropout_rate=0.5, l2_strength=0.001):
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, kernel_regularizer=tf.keras.regularizers.l2(l2_strength), activation='relu'),
layers.Dropout(dropout_rate),
layers.Dense(512, kernel_regularizer=tf.keras.regularizers.l2(l2_strength), activation='relu'),
layers.Dropout(dropout_rate),
layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
# Wrap the model for use with scikit-learn
model = KerasClassifier(build_fn=build_model, epochs=5, batch_size=64, verbose=0)
# Define the hyperparameter grid
param_grid = {'dropout_rate': [0.3, 0.5, 0.7], 'l2_strength': [0.001, 0.01, 0.1]}
# Perform grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_result = grid.fit(train_images, train_labels)
# Print best parameters
print(f"Best parameters: {grid_result.best_params_}")
Objective: Learn and apply various regularization techniques to prevent overfitting and improve generalization in deep learning models.
Skills Developed:
Understanding the principles of overfitting and underfitting.
Hands-on implementation of dropout, L1/L2 regularization, early stopping, batch normalization, and data augmentation.
Visualizing the effects of regularization techniques on model performance.
Tools: TensorFlow, Keras, Matplotlib.
These 10 examples in Week 4 help students understand how regularization techniques can improve model performance and generalization. By experimenting with different methods, they gain a practical understanding of how to apply these techniques in real-world scenarios.