Advanced Applied Deep Learning

Practice Course

Sheng Yun Wu

Week 4: Regularization Techniques and Preventing Overfitting

In Week 4, students focus on regularization techniques to prevent overfitting, improve generalization, and stabilize the training of deep learning models. The examples cover common methods like dropout, L2 and L1 regularization, data augmentation, and early stopping. These practices are essential when working with deep learning models on small or noisy datasets.

Example 1: Introduction to Overfitting and Underfitting

Description:
This example demonstrates the concepts of overfitting and underfitting using a small neural network. Students will observe how models that are too complex overfit the training data, while simpler models underfit.

import tensorflow as tf

from tensorflow.keras import models, layers

# Load MNIST dataset

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Reshape and normalize the data

train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255

# Display shape of the dataset

print(f"Train images shape: {train_images.shape}")

print(f"Test images shape: {test_images.shape}")

# Build a small underfitting model

underfit_model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(16, activation='relu'),

layers.Dense(10, activation='softmax')

])

# Build a larger overfitting model

overfit_model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, activation='relu'),

layers.Dense(10, activation='softmax')

])

# Compile the models

underfit_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

overfit_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the models

underfit_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

overfit_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Output

Train images shape: (60000, 28, 28, 1)

Test images shape: (10000, 28, 28, 1)

Epoch 1/5

1875/1875 [==============================] - 1s 533us/step - loss: 0.4355 - accuracy: 0.8783 - val_loss: 0.2670 - val_accuracy: 0.9252

Epoch 2/5

1875/1875 [==============================] - 1s 475us/step - loss: 0.2500 - accuracy: 0.9291 - val_loss: 0.2238 - val_accuracy: 0.9347

Epoch 3/5

1875/1875 [==============================] - 1s 496us/step - loss: 0.2165 - accuracy: 0.9386 - val_loss: 0.2138 - val_accuracy: 0.9406

Epoch 4/5

1875/1875 [==============================] - 1s 478us/step - loss: 0.1943 - accuracy: 0.9439 - val_loss: 0.1954 - val_accuracy: 0.9453

Epoch 5/5

1875/1875 [==============================] - 1s 494us/step - loss: 0.1793 - accuracy: 0.9480 - val_loss: 0.1763 - val_accuracy: 0.9484

Epoch 1/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.1825 - accuracy: 0.9445 - val_loss: 0.0984 - val_accuracy: 0.9699

Epoch 2/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.0796 - accuracy: 0.9756 - val_loss: 0.0958 - val_accuracy: 0.9716

Epoch 3/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.0558 - accuracy: 0.9819 - val_loss: 0.0804 - val_accuracy: 0.9780

Epoch 4/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.0410 - accuracy: 0.9870 - val_loss: 0.0936 - val_accuracy: 0.9759

Epoch 5/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.0360 - accuracy: 0.9887 - val_loss: 0.0697 - val_accuracy: 0.9802

Example 2: Using Dropout for Regularization

Description:
This example introduces the dropout technique, which helps prevent overfitting by randomly dropping units (neurons) during training.

# Build a model with dropout layers

dropout_model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, activation='relu'),

layers.Dropout(0.5),

layers.Dense(512, activation='relu'),

layers.Dropout(0.5),

layers.Dense(10, activation='softmax')

])

# Compile and train the model with dropout

dropout_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

dropout_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Output

Epoch 1/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.3117 - accuracy: 0.9038 - val_loss: 0.1152 - val_accuracy: 0.9643

Epoch 2/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.1718 - accuracy: 0.9482 - val_loss: 0.1009 - val_accuracy: 0.9688

Epoch 3/5

1875/1875 [==============================] - 5s 3ms/step - loss: 0.1457 - accuracy: 0.9558 - val_loss: 0.0780 - val_accuracy: 0.9759

Epoch 4/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.1312 - accuracy: 0.9615 - val_loss: 0.0791 - val_accuracy: 0.9755

Epoch 5/5

1875/1875 [==============================] - 4s 2ms/step - loss: 0.1218 - accuracy: 0.9641 - val_loss: 0.0939 - val_accuracy: 0.9715

Example 3: Implementing L2 (Ridge) Regularization

Description:
This example shows how to apply L2 regularization, also known as ridge regularization, to penalize large weights and prevent overfitting.

# Build a model with L2 regularization

l2_model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, kernel_regularizer='l2', activation='relu'),

layers.Dense(10, activation='softmax')

])

# Compile and train the model with L2 regularization

l2_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

l2_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Output

Epoch 1/5

1875/1875 [==============================] - 6s 3ms/step - loss: 0.8951 - accuracy: 0.8964 - val_loss: 0.4753 - val_accuracy: 0.9292

Epoch 2/5

1875/1875 [==============================] - 6s 3ms/step - loss: 0.4480 - accuracy: 0.9302 - val_loss: 0.3763 - val_accuracy: 0.9469

Epoch 3/5

1875/1875 [==============================] - 6s 3ms/step - loss: 0.3824 - accuracy: 0.9406 - val_loss: 0.3488 - val_accuracy: 0.9450

Epoch 4/5

1875/1875 [==============================] - 5s 3ms/step - loss: 0.3550 - accuracy: 0.9432 - val_loss: 0.3319 - val_accuracy: 0.9502

Epoch 5/5

1875/1875 [==============================] - 5s 3ms/step - loss: 0.3433 - accuracy: 0.9446 - val_loss: 0.3248 - val_accuracy: 0.9484

Example 4: Applying L1 (Lasso) Regularization

Description:
This example introduces L1 regularization, which encourages sparsity in the model's weights by penalizing the absolute value of weights.

# Build a model with L1 regularization

l1_model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, kernel_regularizer='l1', activation='relu'),

layers.Dense(10, activation='softmax')

])

# Compile and train the model with L1 regularization

l1_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

l1_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Output

Epoch 1/5

1875/1875 [==============================] - 6s 3ms/step - loss: 6.3230 - accuracy: 0.1199 - val_loss: 3.0916 - val_accuracy: 0.1135

Epoch 2/5

1875/1875 [==============================] - 6s 3ms/step - loss: 3.0903 - accuracy: 0.1124 - val_loss: 3.0894 - val_accuracy: 0.1135

Epoch 3/5

1875/1875 [==============================] - 5s 3ms/step - loss: 3.0893 - accuracy: 0.1124 - val_loss: 3.0885 - val_accuracy: 0.1135

Epoch 4/5

1875/1875 [==============================] - 5s 3ms/step - loss: 3.0888 - accuracy: 0.1124 - val_loss: 3.0887 - val_accuracy: 0.1135

Epoch 5/5

1875/1875 [==============================] - 5s 3ms/step - loss: 3.0883 - accuracy: 0.1124 - val_loss: 3.0899 - val_accuracy: 0.1135

Example 5: Combining Dropout and L2 Regularization

Description:
This example combines both dropout and L2 regularization in a single model to demonstrate how these regularization techniques work together to improve generalization.

# Build a model with both dropout and L2 regularization

dropout_l2_model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, kernel_regularizer='l2', activation='relu'),

layers.Dropout(0.5),

layers.Dense(512, kernel_regularizer='l2', activation='relu'),

layers.Dropout(0.5),

layers.Dense(10, activation='softmax')

])

# Compile and train the model with dropout and L2 regularization

dropout_l2_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

dropout_l2_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Output

Epoch 1/5

1875/1875 [==============================] - 6s 3ms/step - loss: 1.2751 - accuracy: 0.8586 - val_loss: 0.7158 - val_accuracy: 0.9111

Epoch 2/5

1875/1875 [==============================] - 6s 3ms/step - loss: 0.8109 - accuracy: 0.8807 - val_loss: 0.7176 - val_accuracy: 0.8985

Epoch 3/5

1875/1875 [==============================] - 6s 3ms/step - loss: 0.7623 - accuracy: 0.8880 - val_loss: 0.6336 - val_accuracy: 0.9272

Epoch 4/5

1875/1875 [==============================] - 6s 3ms/step - loss: 0.7487 - accuracy: 0.8892 - val_loss: 0.5910 - val_accuracy: 0.9330

Epoch 5/5

1875/1875 [==============================] - 6s 3ms/step - loss: 0.7282 - accuracy: 0.8913 - val_loss: 0.5862 - val_accuracy: 0.9330

Example 6: Early Stopping to Prevent Overfitting

Description:
Early stopping is a technique where the training process is stopped as soon as the validation performance stops improving, thus preventing overfitting.

from tensorflow.keras.callbacks import EarlyStopping

# Add early stopping callback

early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Build a simple model

model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, activation='relu'),

layers.Dense(10, activation='softmax')

])

# Compile and train the model with early stopping

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=20, validation_data=(test_images, test_labels), callbacks=[early_stopping])

Output

Epoch 1/20

1875/1875 [==============================] - 3s 2ms/step - loss: 0.2011 - accuracy: 0.9402 - val_loss: 0.0984 - val_accuracy: 0.9704

Epoch 2/20

1875/1875 [==============================] - 3s 2ms/step - loss: 0.0803 - accuracy: 0.9758 - val_loss: 0.0846 - val_accuracy: 0.9735

Epoch 3/20

1875/1875 [==============================] - 3s 2ms/step - loss: 0.0518 - accuracy: 0.9834 - val_loss: 0.0692 - val_accuracy: 0.9787

Epoch 4/20

1875/1875 [==============================] - 3s 2ms/step - loss: 0.0355 - accuracy: 0.9888 - val_loss: 0.0750 - val_accuracy: 0.9763

Epoch 5/20

1875/1875 [==============================] - 3s 2ms/step - loss: 0.0267 - accuracy: 0.9911 - val_loss: 0.0669 - val_accuracy: 0.9808

Epoch 6/20

1875/1875 [==============================] - 3s 1ms/step - loss: 0.0213 - accuracy: 0.9929 - val_loss: 0.0659 - val_accuracy: 0.9814

Epoch 7/20

1875/1875 [==============================] - 3s 2ms/step - loss: 0.0154 - accuracy: 0.9948 - val_loss: 0.0739 - val_accuracy: 0.9812

Epoch 8/20

1875/1875 [==============================] - 3s 2ms/step - loss: 0.0163 - accuracy: 0.9944 - val_loss: 0.0721 - val_accuracy: 0.9815

Epoch 9/20

1875/1875 [==============================] - 3s 1ms/step - loss: 0.0112 - accuracy: 0.9962 - val_loss: 0.0772 - val_accuracy: 0.9819

Example 7: Data Augmentation for Regularization

Description:
This example introduces data augmentation, a technique to artificially increase the diversity of the training dataset by applying random transformations to the images.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define data augmentation generator

datagen = ImageDataGenerator(

rotation_range=40,

width_shift_range=0.2,

height_shift_range=0.2,

shear_range=0.2,

zoom_range=0.2,

horizontal_flip=True,

fill_mode='nearest'

)

# Train the model using augmented data

model.fit(datagen.flow(train_images, train_labels, batch_size=64), epochs=5, validation_data=(test_images, test_labels))

Output

Epoch 1/5

938/938 [==============================] - 10s 10ms/step - loss: 1.3978 - accuracy: 0.5724 - val_loss: 0.4212 - val_accuracy: 0.8675

Epoch 2/5

938/938 [==============================] - 10s 10ms/step - loss: 0.8908 - accuracy: 0.7138 - val_loss: 0.3209 - val_accuracy: 0.9016

Epoch 3/5

938/938 [==============================] - 9s 10ms/step - loss: 0.7717 - accuracy: 0.7517 - val_loss: 0.2885 - val_accuracy: 0.9095

Epoch 4/5

938/938 [==============================] - 9s 10ms/step - loss: 0.6970 - accuracy: 0.7761 - val_loss: 0.3117 - val_accuracy: 0.8958

Epoch 5/5

938/938 [==============================] - 9s 10ms/step - loss: 0.6524 - accuracy: 0.7911 - val_loss: 0.2715 - val_accuracy: 0.9147

Example 8: Batch Normalization for Improved Training

Description:
This example demonstrates how to use batch normalization to speed up training and reduce overfitting by normalizing activations during training.

# Build a model with batch normalization

batch_norm_model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, activation='relu'),

layers.BatchNormalization(),

layers.Dense(512, activation='relu'),

layers.BatchNormalization(),

layers.Dense(10, activation='softmax')

])

# Compile and train the model with batch normalization

batch_norm_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

batch_norm_model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Output

Epoch 1/5

1875/1875 [==============================] - 5s 3ms/step - loss: 0.2075 - accuracy: 0.9369 - val_loss: 0.1104 - val_accuracy: 0.9655

Epoch 2/5

1875/1875 [==============================] - 5s 3ms/step - loss: 0.1082 - accuracy: 0.9668 - val_loss: 0.0942 - val_accuracy: 0.9712

Epoch 3/5

1875/1875 [==============================] - 5s 3ms/step - loss: 0.0857 - accuracy: 0.9731 - val_loss: 0.0867 - val_accuracy: 0.9739

Epoch 4/5

1875/1875 [==============================] - 5s 3ms/step - loss: 0.0677 - accuracy: 0.9786 - val_loss: 0.0783 - val_accuracy: 0.9766

Epoch 5/5

1875/1875 [==============================] - 5s 3ms/step - loss: 0.0584 - accuracy: 0.9812 - val_loss: 0.0705 - val_accuracy: 0.9812

Example 9: Visualizing the Effects of Regularization

Description:
Students use this example to visualize the learning curves (training and validation loss) to observe the effects of applying various regularization techniques, such as dropout, L2, and L1.

import matplotlib.pyplot as plt

# Train a model with dropout

history = dropout_model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

# Plot training and validation loss

plt.plot(history.history['loss'], label='Training Loss')

plt.plot(history.history['val_loss'], label='Validation Loss')

plt.title('Training and Validation Loss with Dropout')

plt.xlabel('Epochs')

plt.ylabel('Loss')

plt.legend()

plt.show()

Output

Epoch 1/10

1875/1875 [==============================] - 5s 2ms/step - loss: 0.1105 - accuracy: 0.9675 - val_loss: 0.0723 - val_accuracy: 0.9785

Epoch 2/10

1875/1875 [==============================] - 5s 2ms/step - loss: 0.1085 - accuracy: 0.9677 - val_loss: 0.0734 - val_accuracy: 0.9774

Epoch 3/10

1875/1875 [==============================] - 5s 2ms/step - loss: 0.1033 - accuracy: 0.9698 - val_loss: 0.0806 - val_accuracy: 0.9774

Epoch 4/10

1875/1875 [==============================] - 4s 2ms/step - loss: 0.0993 - accuracy: 0.9705 - val_loss: 0.0798 - val_accuracy: 0.9771

Epoch 5/10

1875/1875 [==============================] - 4s 2ms/step - loss: 0.0932 - accuracy: 0.9739 - val_loss: 0.0693 - val_accuracy: 0.9802

Epoch 6/10

1875/1875 [==============================] - 5s 2ms/step - loss: 0.0918 - accuracy: 0.9733 - val_loss: 0.0684 - val_accuracy: 0.9803

Epoch 7/10

1875/1875 [==============================] - 5s 3ms/step - loss: 0.0902 - accuracy: 0.9738 - val_loss: 0.0728 - val_accuracy: 0.9813

Epoch 8/10

1875/1875 [==============================] - 4s 2ms/step - loss: 0.0888 - accuracy: 0.9747 - val_loss: 0.0726 - val_accuracy: 0.9815

Epoch 9/10

1875/1875 [==============================] - 4s 2ms/step - loss: 0.0805 - accuracy: 0.9764 - val_loss: 0.0667 - val_accuracy: 0.9822

Epoch 10/10

1875/1875 [==============================] - 5s 3ms/step - loss: 0.0838 - accuracy: 0.9761 - val_loss: 0.0753 - val_accuracy: 0.9814

Example 10: Hyperparameter Tuning for Regularization

Description:
This example introduces hyperparameter tuning for regularization techniques, such as adjusting dropout rates or L2 penalty strength, to find the best combination for the model.

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Define a function to build the model with hyperparameters

def build_model(dropout_rate=0.5, l2_strength=0.001):

model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, kernel_regularizer=tf.keras.regularizers.l2(l2_strength), activation='relu'),

layers.Dropout(dropout_rate),

layers.Dense(512, kernel_regularizer=tf.keras.regularizers.l2(l2_strength), activation='relu'),

layers.Dropout(dropout_rate),

layers.Dense(10, activation='softmax')

])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

return model

# Wrap the model for use with scikit-learn

model = KerasClassifier(build_fn=build_model, epochs=5, batch_size=64, verbose=0)

# Define the hyperparameter grid

param_grid = {'dropout_rate': [0.3, 0.5, 0.7], 'l2_strength': [0.001, 0.01, 0.1]}

# Perform grid search

grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)

grid_result = grid.fit(train_images, train_labels)

# Print best parameters

print(f"Best parameters: {grid_result.best_params_}")

Output

Best parameters: {'dropout_rate': 0.3, 'l2_strength': 0.001}

Week 4 Summary

Objective: Learn and apply various regularization techniques to prevent overfitting and improve generalization in deep learning models.
Skills Developed:
- Understanding the principles of overfitting and underfitting.
- Hands-on implementation of dropout, L1/L2 regularization, early stopping, batch normalization, and data augmentation.
- Visualizing the effects of regularization techniques on model performance.
Tools: TensorFlow, Keras, Matplotlib.

These 10 examples in Week 4 help students understand how regularization techniques can improve model performance and generalization. By experimenting with different methods, they gain a practical understanding of how to apply these techniques in real-world scenarios.

Page updated

Report abuse