Advanced Applied Deep Learning

Practice Course

Sheng Yun Wu

Week 6: Hyperparameter Tuning

In Week 6, students will learn about hyperparameter tuning, an essential process for optimizing the performance of deep learning models. Hyperparameters such as learning rates, batch sizes, optimizers, and layer sizes greatly affect the model's ability to learn. Students will explore various approaches for tuning these hyperparameters, including manual tuning, grid search, and random search. They will also get hands-on experience with automated hyperparameter optimization libraries.

Example 1: Manual Hyperparameter Tuning

Description:
This example introduces the basic concept of manually tuning hyperparameters like the learning rate and batch size. Students will manually adjust these values and observe their effect on training.

import tensorflow as tf

from tensorflow.keras import models, layers

from tensorflow.keras.optimizers import Adam

# Load MNIST dataset

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Reshape and normalize the data

train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255

# Display shape of the dataset

print(f"Train images shape: {train_images.shape}")

print(f"Test images shape: {test_images.shape}")

# Build a simple model

model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, activation='relu'),

layers.Dense(10, activation='softmax')

])

# Experiment with different learning rates and batch sizes

learning_rate = 0.001

batch_size = 64

# Compile and train the model

model.compile(optimizer=Adam(learning_rate=learning_rate), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5, batch_size=batch_size, validation_data=(test_images, test_labels))

# Adjust the learning rate and batch size and observe the changes in performance

learning_rate = 0.0001

batch_size = 128

model.compile(optimizer=Adam(learning_rate=learning_rate), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5, batch_size=batch_size, validation_data=(test_images, test_labels))

Output

Train images shape: (60000, 28, 28, 1)

Test images shape: (10000, 28, 28, 1)

Epoch 1/5

938/938 [==============================] - 2s 2ms/step - loss: 0.2280 - accuracy: 0.9345 - val_loss: 0.1147 - val_accuracy: 0.9669

Epoch 2/5

938/938 [==============================] - 2s 2ms/step - loss: 0.0903 - accuracy: 0.9736 - val_loss: 0.0763 - val_accuracy: 0.9765

Epoch 3/5

938/938 [==============================] - 2s 2ms/step - loss: 0.0571 - accuracy: 0.9827 - val_loss: 0.0681 - val_accuracy: 0.9788

Epoch 4/5

938/938 [==============================] - 2s 2ms/step - loss: 0.0395 - accuracy: 0.9879 - val_loss: 0.0676 - val_accuracy: 0.9786

Epoch 5/5

938/938 [==============================] - 2s 2ms/step - loss: 0.0291 - accuracy: 0.9912 - val_loss: 0.0661 - val_accuracy: 0.9789

Epoch 1/5

469/469 [==============================] - 1s 3ms/step - loss: 0.0124 - accuracy: 0.9973 - val_loss: 0.0540 - val_accuracy: 0.9840

Epoch 2/5

469/469 [==============================] - 1s 2ms/step - loss: 0.0088 - accuracy: 0.9984 - val_loss: 0.0530 - val_accuracy: 0.9846

Epoch 3/5

469/469 [==============================] - 1s 2ms/step - loss: 0.0074 - accuracy: 0.9989 - val_loss: 0.0531 - val_accuracy: 0.9844

Epoch 4/5

469/469 [==============================] - 1s 2ms/step - loss: 0.0065 - accuracy: 0.9991 - val_loss: 0.0537 - val_accuracy: 0.9840

Epoch 5/5

469/469 [==============================] - 1s 2ms/step - loss: 0.0059 - accuracy: 0.9992 - val_loss: 0.0541 - val_accuracy: 0.9841

Example 2: Using Grid Search for Hyperparameter Tuning

Description:
This example demonstrates how to use grid search to systematically try different combinations of hyperparameters (learning rates, batch sizes, and optimizers) and select the best-performing model.

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Define a function to build the model

def build_model(learning_rate=0.001, optimizer='adam'):

model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, activation='relu'),

layers.Dense(10, activation='softmax')

])

model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

return model

# Wrap the model in KerasClassifier

model = KerasClassifier(build_fn=build_model, epochs=5, batch_size=64, verbose=0)

# Define the hyperparameter grid

param_grid = {'optimizer': ['adam', 'rmsprop'], 'learning_rate': [0.001, 0.0001], 'batch_size': [32, 64, 128]}

# Perform grid search

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(train_images, train_labels)

# Print best parameters and accuracy

print(f"Best params: {grid_result.best_params_}")

print(f"Best accuracy: {grid_result.best_score_}")

Output

Best params: {'batch_size': 64, 'learning_rate': 0.0001, 'optimizer': 'rmsprop'}

Best accuracy: 0.975516676902771

Example 3: Using Random Search for Hyperparameter Tuning

Description:
Random search is an efficient alternative to grid search that randomly samples hyperparameter values. This example shows how to implement random search for hyperparameter tuning.

from sklearn.model_selection import RandomizedSearchCV

import numpy as np

# Define the hyperparameter distribution

param_dist = {'optimizer': ['adam', 'rmsprop'], 'learning_rate': [0.001, 0.0001], 'batch_size': [32, 64, 128]}

# Perform random search

random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=5, cv=3, random_state=42)

random_search_result = random_search.fit(train_images, train_labels)

# Print best parameters and accuracy

print(f"Best params: {random_search_result.best_params_}")

print(f"Best accuracy: {random_search_result.best_score_}")

Output

Best params: {'optimizer': 'rmsprop', 'learning_rate': 0.001, 'batch_size': 64}

Best accuracy: 0.9732000033060709

Example 4: Tuning Learning Rate and Momentum with Grid Search

Description:
This example demonstrates how to perform grid search to tune both the learning rate and momentum of the SGD optimizer.

from tensorflow.keras.optimizers import SGD

# Define a function to build a model with SGD

def build_sgd_model(learning_rate=0.001, momentum=0.9):

model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(512, activation='relu'),

layers.Dense(10, activation='softmax')

])

optimizer = SGD(learning_rate=learning_rate, momentum=momentum)

model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

return model

# Wrap the model for grid search

sgd_model = KerasClassifier(build_fn=build_sgd_model, epochs=5, verbose=0)

# Define the grid search parameters

param_grid = {'learning_rate': [0.01, 0.001, 0.0001], 'momentum': [0.8, 0.9, 0.99]}

# Perform grid search

grid = GridSearchCV(estimator=sgd_model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(train_images, train_labels)

# Print best parameters and accuracy

print(f"Best params: {grid_result.best_params_}")

print(f"Best accuracy: {grid_result.best_score_}")

Output

Best params: {'learning_rate': 0.01, 'momentum': 0.9}

Best accuracy: 0.9699333310127258

Example 5: Hyperparameter Tuning with Early Stopping

Description:
This example demonstrates how to combine hyperparameter tuning with early stopping to find the best hyperparameters while preventing overfitting.

from tensorflow.keras.callbacks import EarlyStopping

# Add early stopping callback

early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Wrap the model for grid search

model = KerasClassifier(build_fn=build_model, epochs=20, batch_size=64, verbose=0)

# Define the hyperparameter grid

param_grid = {'optimizer': ['adam', 'rmsprop'], 'learning_rate': [0.001, 0.0001], 'batch_size': [32, 64, 128]}

# Perform grid search with early stopping

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(train_images, train_labels, validation_data=(test_images, test_labels), callbacks=[early_stopping])

# Print best parameters and accuracy

print(f"Best params: {grid_result.best_params_}")

print(f"Best accuracy: {grid_result.best_score_}")

Output

Best params: {'batch_size': 128, 'learning_rate': 0.001, 'optimizer': 'adam'}

Best accuracy: 0.976449986298879

Example 6: Using Bayesian Optimization for Hyperparameter Tuning

Description:
Bayesian optimization is a more advanced technique for hyperparameter tuning. This example demonstrates how to implement Bayesian optimization using the hyperopt library.

from hyperopt import hp, fmin, tpe, Trials

from hyperopt.pyll.base import scope

from tensorflow.keras.optimizers import Adam, RMSprop

# Define the hyperparameter space

space = {

'optimizer': hp.choice('optimizer', ['adam', 'rmsprop']),

'learning_rate': hp.loguniform('learning_rate', np.log(0.0001), np.log(0.01)),

'batch_size': scope.int(hp.quniform('batch_size', 32, 128, 32))

}

# Define the objective function

def objective(params):

model = build_model(learning_rate=params['learning_rate'], optimizer=params['optimizer'])

model.fit(train_images, train_labels, epochs=5, batch_size=params['batch_size'], verbose=0, validation_data=(test_images, test_labels))

val_loss, val_acc = model.evaluate(test_images, test_labels, verbose=0)

return {'loss': -val_acc, 'status': 'ok'}

# Perform Bayesian optimization

trials = Trials()

best = fmin(fn=objective, space=space, algo=tpe.suggest, max_evals=20, trials=trials)

# Print the best hyperparameters

print(best)

Output

100%|██████████| 20/20 [04:40<00:00, 14.02s/trial, best loss: -0.9818999767303467]

{'batch_size': 96.0, 'learning_rate': 0.00014406340807234902, 'optimizer': 0}

Example 7: Random Search with Early Stopping

Description:
This example shows how to combine random search with early stopping to quickly explore a hyperparameter space and avoid overfitting.

# Perform random search with early stopping

random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=5, cv=3, random_state=42)

random_search_result = random_search.fit(train_images, train_labels, validation_data=(test_images, test_labels), callbacks=[early_stopping])

# Print best parameters and accuracy

print(f"Best params: {random_search_result.best_params_}")

print(f"Best accuracy: {random_search_result.best_score_}")

Output

Best params: {'optimizer': 'adam', 'learning_rate': 0.001, 'batch_size': 128}

Best accuracy: 0.9762333234151205

Example 8: Tuning Model Architecture with Grid Search

Description:
This example demonstrates how to tune the architecture of the model, such as the number of layers and the number of neurons per layer, using grid search.

# Define a function to build the model with tunable architecture

def build_model_architecture(layers_size=512, num_layers=2):

model = models.Sequential([layers.Flatten(input_shape=(28, 28, 1))])

for _ in range(num_layers):

model.add(layers.Dense(layers_size, activation='relu'))

model.add(layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

return model

# Define the hyperparameter grid

param_grid = {'layers_size': [128, 256, 512], 'num_layers': [1, 2, 3]}

# Perform grid search

grid = GridSearchCV(estimator=KerasClassifier(build_fn=build_model_architecture, epochs=5, batch_size=64, verbose=0), param_grid=param_grid)

grid_result = grid.fit(train_images, train_labels)

# Print best parameters and accuracy

print(f"Best params: {grid_result.best_params_}")

print(f"Best accuracy: {grid_result.best_score_}")

Output

Best params: {'layers_size': 512, 'num_layers': 1}

Best accuracy: 0.9753333330154419

Example 9: Tuning Batch Size and Epochs

Description:
This example focuses on tuning batch size and the number of epochs to find the best trade-off between training time and performance.

# Define the hyperparameter grid for batch size and epochs

param_grid = {'batch_size': [32, 64, 128], 'epochs': [5, 10, 15]}

# Perform grid search for batch size and epochs

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(train_images, train_labels)

# Print best parameters and accuracy

print(f"Best params: {grid_result.best_params_}")

print(f"Best accuracy: {grid_result.best_score_}")

Output

Best params: {'batch_size': 128, 'epochs': 15}

Best accuracy: 0.9775166710217794

Example 10: Automated Hyperparameter Tuning with Keras Tuner

Description:
This example introduces Keras Tuner, a library specifically designed for hyperparameter tuning in Keras models. It automates the search process using multiple search algorithms.

from kerastuner import HyperModel, RandomSearch

# Define a hypermodel for tuning

class MyHyperModel(HyperModel):

def build(self, hp):

model = models.Sequential([

layers.Flatten(input_shape=(28, 28, 1)),

layers.Dense(hp.Int('units', min_value=128, max_value=512, step=128), activation='relu'),

layers.Dense(10, activation='softmax')

])

model.compile(optimizer=hp.Choice('optimizer', ['adam', 'rmsprop']), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

return model

# Initialize Keras Tuner

tuner = RandomSearch(MyHyperModel(), objective='val_accuracy', max_trials=5, executions_per_trial=3, directory='my_dir', project_name='mnist')

# Run the tuner search

tuner.search(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

# Print the best hyperparameters

best_hps = tuner.get_best_hyperparameters()[0]

print(f"Best units: {best_hps.get('units')}, Best optimizer: {best_hps.get('optimizer')}")

Output

Best units: 256, Best optimizer: adam

Week 6 Summary

Objective: Learn various hyperparameter tuning techniques to optimize the performance of deep learning models.
Skills Developed:
- Manual hyperparameter tuning for learning rates, batch sizes, and optimizers.
- Systematic tuning with grid search and random search.
- Advanced tuning techniques using Bayesian optimization and automated tools like Keras Tuner.
Tools: TensorFlow, Keras, GridSearchCV, RandomizedSearchCV, Hyperopt, Keras Tuner.

These 10 examples in Week 6 provide students with a comprehensive understanding of hyperparameter tuning and the tools needed to improve the performance of their models. By experimenting with different techniques, they gain practical insights into the importance of hyperparameter optimization in deep learning.

Page updated

Report abuse