Advanced Applied Deep Learning
Practice Course
Sheng Yun Wu
Practice Course
Sheng Yun Wu
In Week 4, students focus on regularization techniques to prevent overfitting, improve generalization, and stabilize the training of deep learning models. The examples cover common methods like dropout, L2 and L1 regularization, data augmentation, and early stopping. These practices are essential when working with deep learning models on small or noisy datasets.
Description:
This example demonstrates the concepts of overfitting and underfitting using a small neural network. Students will observe how models that are too complex overfit the training data, while simpler models underfit.
import tensorflow as tf
from tensorflow.keras import models, layers
# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
# Reshape and normalize the data
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
# Display shape of the dataset
print(f"Train images shape: {train_images.shape}")
print(f"Test images shape: {test_images.shape}")
# Build a small underfitting model
underfit_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(16, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Build a larger overfitting model
overfit_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile the models
underfit_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
overfit_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the models
underfit_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
overfit_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Output
Train images shape: (60000, 28, 28, 1)
Test images shape: (10000, 28, 28, 1)
Epoch 1/5
1875/1875 [==============================] - 1s 533us/step - loss: 0.4355 - accuracy: 0.8783 - val_loss: 0.2670 - val_accuracy: 0.9252
Epoch 2/5
1875/1875 [==============================] - 1s 475us/step - loss: 0.2500 - accuracy: 0.9291 - val_loss: 0.2238 - val_accuracy: 0.9347
Epoch 3/5
1875/1875 [==============================] - 1s 496us/step - loss: 0.2165 - accuracy: 0.9386 - val_loss: 0.2138 - val_accuracy: 0.9406
Epoch 4/5
1875/1875 [==============================] - 1s 478us/step - loss: 0.1943 - accuracy: 0.9439 - val_loss: 0.1954 - val_accuracy: 0.9453
Epoch 5/5
1875/1875 [==============================] - 1s 494us/step - loss: 0.1793 - accuracy: 0.9480 - val_loss: 0.1763 - val_accuracy: 0.9484
Epoch 1/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1825 - accuracy: 0.9445 - val_loss: 0.0984 - val_accuracy: 0.9699
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0796 - accuracy: 0.9756 - val_loss: 0.0958 - val_accuracy: 0.9716
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0558 - accuracy: 0.9819 - val_loss: 0.0804 - val_accuracy: 0.9780
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0410 - accuracy: 0.9870 - val_loss: 0.0936 - val_accuracy: 0.9759
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0360 - accuracy: 0.9887 - val_loss: 0.0697 - val_accuracy: 0.9802
Description:
This example introduces the dropout technique, which helps prevent overfitting by randomly dropping units (neurons) during training.
# Build a model with dropout layers
dropout_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with dropout
dropout_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
dropout_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Output
Epoch 1/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3117 - accuracy: 0.9038 - val_loss: 0.1152 - val_accuracy: 0.9643
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1718 - accuracy: 0.9482 - val_loss: 0.1009 - val_accuracy: 0.9688
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1457 - accuracy: 0.9558 - val_loss: 0.0780 - val_accuracy: 0.9759
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1312 - accuracy: 0.9615 - val_loss: 0.0791 - val_accuracy: 0.9755
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1218 - accuracy: 0.9641 - val_loss: 0.0939 - val_accuracy: 0.9715
Description:
This example shows how to apply L2 regularization, also known as ridge regularization, to penalize large weights and prevent overfitting.
# Build a model with L2 regularization
l2_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, kernel_regularizer='l2', activation='relu'),
layers.Dense(512, kernel_regularizer='l2', activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with L2 regularization
l2_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
l2_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Output
Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.8951 - accuracy: 0.8964 - val_loss: 0.4753 - val_accuracy: 0.9292
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.4480 - accuracy: 0.9302 - val_loss: 0.3763 - val_accuracy: 0.9469
Epoch 3/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.3824 - accuracy: 0.9406 - val_loss: 0.3488 - val_accuracy: 0.9450
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.3550 - accuracy: 0.9432 - val_loss: 0.3319 - val_accuracy: 0.9502
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.3433 - accuracy: 0.9446 - val_loss: 0.3248 - val_accuracy: 0.9484
Description:
This example introduces L1 regularization, which encourages sparsity in the model's weights by penalizing the absolute value of weights.
# Build a model with L1 regularization
l1_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, kernel_regularizer='l1', activation='relu'),
layers.Dense(512, kernel_regularizer='l1', activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with L1 regularization
l1_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
l1_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Output
Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 6.3230 - accuracy: 0.1199 - val_loss: 3.0916 - val_accuracy: 0.1135
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 3.0903 - accuracy: 0.1124 - val_loss: 3.0894 - val_accuracy: 0.1135
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 3.0893 - accuracy: 0.1124 - val_loss: 3.0885 - val_accuracy: 0.1135
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 3.0888 - accuracy: 0.1124 - val_loss: 3.0887 - val_accuracy: 0.1135
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 3.0883 - accuracy: 0.1124 - val_loss: 3.0899 - val_accuracy: 0.1135
Description:
This example combines both dropout and L2 regularization in a single model to demonstrate how these regularization techniques work together to improve generalization.
# Build a model with both dropout and L2 regularization
dropout_l2_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, kernel_regularizer='l2', activation='relu'),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer='l2', activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with dropout and L2 regularization
dropout_l2_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
dropout_l2_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Output
Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 1.2751 - accuracy: 0.8586 - val_loss: 0.7158 - val_accuracy: 0.9111
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.8109 - accuracy: 0.8807 - val_loss: 0.7176 - val_accuracy: 0.8985
Epoch 3/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7623 - accuracy: 0.8880 - val_loss: 0.6336 - val_accuracy: 0.9272
Epoch 4/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7487 - accuracy: 0.8892 - val_loss: 0.5910 - val_accuracy: 0.9330
Epoch 5/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7282 - accuracy: 0.8913 - val_loss: 0.5862 - val_accuracy: 0.9330
Description:
Early stopping is a technique where the training process is stopped as soon as the validation performance stops improving, thus preventing overfitting.
from tensorflow.keras.callbacks import EarlyStopping
# Add early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)
# Build a simple model
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with early stopping
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=20, validation_data=(test_images, test_labels), callbacks=[early_stopping])
Output
Epoch 1/20
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2011 - accuracy: 0.9402 - val_loss: 0.0984 - val_accuracy: 0.9704
Epoch 2/20
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0803 - accuracy: 0.9758 - val_loss: 0.0846 - val_accuracy: 0.9735
Epoch 3/20
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0518 - accuracy: 0.9834 - val_loss: 0.0692 - val_accuracy: 0.9787
Epoch 4/20
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0355 - accuracy: 0.9888 - val_loss: 0.0750 - val_accuracy: 0.9763
Epoch 5/20
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0267 - accuracy: 0.9911 - val_loss: 0.0669 - val_accuracy: 0.9808
Epoch 6/20
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0213 - accuracy: 0.9929 - val_loss: 0.0659 - val_accuracy: 0.9814
Epoch 7/20
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0154 - accuracy: 0.9948 - val_loss: 0.0739 - val_accuracy: 0.9812
Epoch 8/20
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0163 - accuracy: 0.9944 - val_loss: 0.0721 - val_accuracy: 0.9815
Epoch 9/20
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0112 - accuracy: 0.9962 - val_loss: 0.0772 - val_accuracy: 0.9819
Description:
This example introduces data augmentation, a technique to artificially increase the diversity of the training dataset by applying random transformations to the images.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Define data augmentation generator
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)
# Train the model using augmented data
model.fit(datagen.flow(train_images, train_labels, batch_size=64), epochs=5, validation_data=(test_images, test_labels))
Output
Epoch 1/5
938/938 [==============================] - 10s 10ms/step - loss: 1.3978 - accuracy: 0.5724 - val_loss: 0.4212 - val_accuracy: 0.8675
Epoch 2/5
938/938 [==============================] - 10s 10ms/step - loss: 0.8908 - accuracy: 0.7138 - val_loss: 0.3209 - val_accuracy: 0.9016
Epoch 3/5
938/938 [==============================] - 9s 10ms/step - loss: 0.7717 - accuracy: 0.7517 - val_loss: 0.2885 - val_accuracy: 0.9095
Epoch 4/5
938/938 [==============================] - 9s 10ms/step - loss: 0.6970 - accuracy: 0.7761 - val_loss: 0.3117 - val_accuracy: 0.8958
Epoch 5/5
938/938 [==============================] - 9s 10ms/step - loss: 0.6524 - accuracy: 0.7911 - val_loss: 0.2715 - val_accuracy: 0.9147
Description:
This example demonstrates how to use batch normalization to speed up training and reduce overfitting by normalizing activations during training.
# Build a model with batch normalization
batch_norm_model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.BatchNormalization(),
layers.Dense(512, activation='relu'),
layers.BatchNormalization(),
layers.Dense(10, activation='softmax')
])
# Compile and train the model with batch normalization
batch_norm_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
batch_norm_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Output
Epoch 1/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.2075 - accuracy: 0.9369 - val_loss: 0.1104 - val_accuracy: 0.9655
Epoch 2/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1082 - accuracy: 0.9668 - val_loss: 0.0942 - val_accuracy: 0.9712
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0857 - accuracy: 0.9731 - val_loss: 0.0867 - val_accuracy: 0.9739
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0677 - accuracy: 0.9786 - val_loss: 0.0783 - val_accuracy: 0.9766
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0584 - accuracy: 0.9812 - val_loss: 0.0705 - val_accuracy: 0.9812
Description:
Students use this example to visualize the learning curves (training and validation loss) to observe the effects of applying various regularization techniques, such as dropout, L2, and L1.
import matplotlib.pyplot as plt
# Train a model with dropout
history = dropout_model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))
# Plot training and validation loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss with Dropout')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Output
Epoch 1/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.1105 - accuracy: 0.9675 - val_loss: 0.0723 - val_accuracy: 0.9785
Epoch 2/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.1085 - accuracy: 0.9677 - val_loss: 0.0734 - val_accuracy: 0.9774
Epoch 3/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.1033 - accuracy: 0.9698 - val_loss: 0.0806 - val_accuracy: 0.9774
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0993 - accuracy: 0.9705 - val_loss: 0.0798 - val_accuracy: 0.9771
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0932 - accuracy: 0.9739 - val_loss: 0.0693 - val_accuracy: 0.9802
Epoch 6/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.0918 - accuracy: 0.9733 - val_loss: 0.0684 - val_accuracy: 0.9803
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0902 - accuracy: 0.9738 - val_loss: 0.0728 - val_accuracy: 0.9813
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0888 - accuracy: 0.9747 - val_loss: 0.0726 - val_accuracy: 0.9815
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0805 - accuracy: 0.9764 - val_loss: 0.0667 - val_accuracy: 0.9822
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0838 - accuracy: 0.9761 - val_loss: 0.0753 - val_accuracy: 0.9814
Description:
This example introduces hyperparameter tuning for regularization techniques, such as adjusting dropout rates or L2 penalty strength, to find the best combination for the model.
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# Define a function to build the model with hyperparameters
def build_model(dropout_rate=0.5, l2_strength=0.001):
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, kernel_regularizer=tf.keras.regularizers.l2(l2_strength), activation='relu'),
layers.Dropout(dropout_rate),
layers.Dense(512, kernel_regularizer=tf.keras.regularizers.l2(l2_strength), activation='relu'),
layers.Dropout(dropout_rate),
layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
# Wrap the model for use with scikit-learn
model = KerasClassifier(build_fn=build_model, epochs=5, batch_size=64, verbose=0)
# Define the hyperparameter grid
param_grid = {'dropout_rate': [0.3, 0.5, 0.7], 'l2_strength': [0.001, 0.01, 0.1]}
# Perform grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_result = grid.fit(train_images, train_labels)
# Print best parameters
print(f"Best parameters: {grid_result.best_params_}")
Output
Best parameters: {'dropout_rate': 0.3, 'l2_strength': 0.001}
Objective: Learn and apply various regularization techniques to prevent overfitting and improve generalization in deep learning models.
Skills Developed:
Understanding the principles of overfitting and underfitting.
Hands-on implementation of dropout, L1/L2 regularization, early stopping, batch normalization, and data augmentation.
Visualizing the effects of regularization techniques on model performance.
Tools: TensorFlow, Keras, Matplotlib.
These 10 examples in Week 4 help students understand how regularization techniques can improve model performance and generalization. By experimenting with different methods, they gain a practical understanding of how to apply these techniques in real-world scenarios.