Advanced Applied Deep Learning
Practice Course
Sheng Yun Wu
Practice Course
Sheng Yun Wu
In Week 6, students will learn about hyperparameter tuning, an essential process for optimizing the performance of deep learning models. Hyperparameters such as learning rates, batch sizes, optimizers, and layer sizes greatly affect the model's ability to learn. Students will explore various approaches for tuning these hyperparameters, including manual tuning, grid search, and random search. They will also get hands-on experience with automated hyperparameter optimization libraries.
Description:
This example introduces the basic concept of manually tuning hyperparameters like the learning rate and batch size. Students will manually adjust these values and observe their effect on training.
import tensorflow as tf
from tensorflow.keras import models, layers
from tensorflow.keras.optimizers import Adam
# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
# Reshape and normalize the data
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
# Display shape of the dataset
print(f"Train images shape: {train_images.shape}")
print(f"Test images shape: {test_images.shape}")
# Build a simple model
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Experiment with different learning rates and batch sizes
learning_rate = 0.001
batch_size = 64
# Compile and train the model
model.compile(optimizer=Adam(learning_rate=learning_rate), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=batch_size, validation_data=(test_images, test_labels))
# Adjust the learning rate and batch size and observe the changes in performance
learning_rate = 0.0001
batch_size = 128
model.compile(optimizer=Adam(learning_rate=learning_rate), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=batch_size, validation_data=(test_images, test_labels))
Output
Train images shape: (60000, 28, 28, 1)
Test images shape: (10000, 28, 28, 1)
Epoch 1/5
938/938 [==============================] - 2s 2ms/step - loss: 0.2280 - accuracy: 0.9345 - val_loss: 0.1147 - val_accuracy: 0.9669
Epoch 2/5
938/938 [==============================] - 2s 2ms/step - loss: 0.0903 - accuracy: 0.9736 - val_loss: 0.0763 - val_accuracy: 0.9765
Epoch 3/5
938/938 [==============================] - 2s 2ms/step - loss: 0.0571 - accuracy: 0.9827 - val_loss: 0.0681 - val_accuracy: 0.9788
Epoch 4/5
938/938 [==============================] - 2s 2ms/step - loss: 0.0395 - accuracy: 0.9879 - val_loss: 0.0676 - val_accuracy: 0.9786
Epoch 5/5
938/938 [==============================] - 2s 2ms/step - loss: 0.0291 - accuracy: 0.9912 - val_loss: 0.0661 - val_accuracy: 0.9789
Epoch 1/5
469/469 [==============================] - 1s 3ms/step - loss: 0.0124 - accuracy: 0.9973 - val_loss: 0.0540 - val_accuracy: 0.9840
Epoch 2/5
469/469 [==============================] - 1s 2ms/step - loss: 0.0088 - accuracy: 0.9984 - val_loss: 0.0530 - val_accuracy: 0.9846
Epoch 3/5
469/469 [==============================] - 1s 2ms/step - loss: 0.0074 - accuracy: 0.9989 - val_loss: 0.0531 - val_accuracy: 0.9844
Epoch 4/5
469/469 [==============================] - 1s 2ms/step - loss: 0.0065 - accuracy: 0.9991 - val_loss: 0.0537 - val_accuracy: 0.9840
Epoch 5/5
469/469 [==============================] - 1s 2ms/step - loss: 0.0059 - accuracy: 0.9992 - val_loss: 0.0541 - val_accuracy: 0.9841
Description:
This example demonstrates how to use grid search to systematically try different combinations of hyperparameters (learning rates, batch sizes, and optimizers) and select the best-performing model.
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# Define a function to build the model
def build_model(learning_rate=0.001, optimizer='adam'):
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')
])
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
# Wrap the model in KerasClassifier
model = KerasClassifier(build_fn=build_model, epochs=5, batch_size=64, verbose=0)
# Define the hyperparameter grid
param_grid = {'optimizer': ['adam', 'rmsprop'], 'learning_rate': [0.001, 0.0001], 'batch_size': [32, 64, 128]}
# Perform grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(train_images, train_labels)
# Print best parameters and accuracy
print(f"Best params: {grid_result.best_params_}")
print(f"Best accuracy: {grid_result.best_score_}")
Output
Best params: {'batch_size': 64, 'learning_rate': 0.0001, 'optimizer': 'rmsprop'}
Best accuracy: 0.975516676902771
Description:
Random search is an efficient alternative to grid search that randomly samples hyperparameter values. This example shows how to implement random search for hyperparameter tuning.
from sklearn.model_selection import RandomizedSearchCV
import numpy as np
# Define the hyperparameter distribution
param_dist = {'optimizer': ['adam', 'rmsprop'], 'learning_rate': [0.001, 0.0001], 'batch_size': [32, 64, 128]}
# Perform random search
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=5, cv=3, random_state=42)
random_search_result = random_search.fit(train_images, train_labels)
# Print best parameters and accuracy
print(f"Best params: {random_search_result.best_params_}")
print(f"Best accuracy: {random_search_result.best_score_}")
Output
Best params: {'optimizer': 'rmsprop', 'learning_rate': 0.001, 'batch_size': 64}
Best accuracy: 0.9732000033060709
Description:
This example demonstrates how to perform grid search to tune both the learning rate and momentum of the SGD optimizer.
from tensorflow.keras.optimizers import SGD
# Define a function to build a model with SGD
def build_sgd_model(learning_rate=0.001, momentum=0.9):
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')
])
optimizer = SGD(learning_rate=learning_rate, momentum=momentum)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
# Wrap the model for grid search
sgd_model = KerasClassifier(build_fn=build_sgd_model, epochs=5, verbose=0)
# Define the grid search parameters
param_grid = {'learning_rate': [0.01, 0.001, 0.0001], 'momentum': [0.8, 0.9, 0.99]}
# Perform grid search
grid = GridSearchCV(estimator=sgd_model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(train_images, train_labels)
# Print best parameters and accuracy
print(f"Best params: {grid_result.best_params_}")
print(f"Best accuracy: {grid_result.best_score_}")
Output
Best params: {'learning_rate': 0.01, 'momentum': 0.9}
Best accuracy: 0.9699333310127258
Description:
This example demonstrates how to combine hyperparameter tuning with early stopping to find the best hyperparameters while preventing overfitting.
from tensorflow.keras.callbacks import EarlyStopping
# Add early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)
# Wrap the model for grid search
model = KerasClassifier(build_fn=build_model, epochs=20, batch_size=64, verbose=0)
# Define the hyperparameter grid
param_grid = {'optimizer': ['adam', 'rmsprop'], 'learning_rate': [0.001, 0.0001], 'batch_size': [32, 64, 128]}
# Perform grid search with early stopping
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(train_images, train_labels, validation_data=(test_images, test_labels), callbacks=[early_stopping])
# Print best parameters and accuracy
print(f"Best params: {grid_result.best_params_}")
print(f"Best accuracy: {grid_result.best_score_}")
Output
Best params: {'batch_size': 128, 'learning_rate': 0.001, 'optimizer': 'adam'}
Best accuracy: 0.976449986298879
Description:
Bayesian optimization is a more advanced technique for hyperparameter tuning. This example demonstrates how to implement Bayesian optimization using the hyperopt library.
from hyperopt import hp, fmin, tpe, Trials
from hyperopt.pyll.base import scope
from tensorflow.keras.optimizers import Adam, RMSprop
# Define the hyperparameter space
space = {
'optimizer': hp.choice('optimizer', ['adam', 'rmsprop']),
'learning_rate': hp.loguniform('learning_rate', np.log(0.0001), np.log(0.01)),
'batch_size': scope.int(hp.quniform('batch_size', 32, 128, 32))
}
# Define the objective function
def objective(params):
model = build_model(learning_rate=params['learning_rate'], optimizer=params['optimizer'])
model.fit(train_images, train_labels, epochs=5, batch_size=params['batch_size'], verbose=0, validation_data=(test_images, test_labels))
val_loss, val_acc = model.evaluate(test_images, test_labels, verbose=0)
return {'loss': -val_acc, 'status': 'ok'}
# Perform Bayesian optimization
trials = Trials()
best = fmin(fn=objective, space=space, algo=tpe.suggest, max_evals=20, trials=trials)
# Print the best hyperparameters
print(best)
Output
100%|██████████| 20/20 [04:40<00:00, 14.02s/trial, best loss: -0.9818999767303467]
{'batch_size': 96.0, 'learning_rate': 0.00014406340807234902, 'optimizer': 0}
Description:
This example shows how to combine random search with early stopping to quickly explore a hyperparameter space and avoid overfitting.
# Perform random search with early stopping
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=5, cv=3, random_state=42)
random_search_result = random_search.fit(train_images, train_labels, validation_data=(test_images, test_labels), callbacks=[early_stopping])
# Print best parameters and accuracy
print(f"Best params: {random_search_result.best_params_}")
print(f"Best accuracy: {random_search_result.best_score_}")
Output
Best params: {'optimizer': 'adam', 'learning_rate': 0.001, 'batch_size': 128}
Best accuracy: 0.9762333234151205
Description:
This example demonstrates how to tune the architecture of the model, such as the number of layers and the number of neurons per layer, using grid search.
# Define a function to build the model with tunable architecture
def build_model_architecture(layers_size=512, num_layers=2):
model = models.Sequential([layers.Flatten(input_shape=(28, 28, 1))])
for _ in range(num_layers):
model.add(layers.Dense(layers_size, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
# Define the hyperparameter grid
param_grid = {'layers_size': [128, 256, 512], 'num_layers': [1, 2, 3]}
# Perform grid search
grid = GridSearchCV(estimator=KerasClassifier(build_fn=build_model_architecture, epochs=5, batch_size=64, verbose=0), param_grid=param_grid)
grid_result = grid.fit(train_images, train_labels)
# Print best parameters and accuracy
print(f"Best params: {grid_result.best_params_}")
print(f"Best accuracy: {grid_result.best_score_}")
Output
Best params: {'layers_size': 512, 'num_layers': 1}
Best accuracy: 0.9753333330154419
Description:
This example focuses on tuning batch size and the number of epochs to find the best trade-off between training time and performance.
# Define the hyperparameter grid for batch size and epochs
param_grid = {'batch_size': [32, 64, 128], 'epochs': [5, 10, 15]}
# Perform grid search for batch size and epochs
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(train_images, train_labels)
# Print best parameters and accuracy
print(f"Best params: {grid_result.best_params_}")
print(f"Best accuracy: {grid_result.best_score_}")
Output
Best params: {'batch_size': 128, 'epochs': 15}
Best accuracy: 0.9775166710217794
Description:
This example introduces Keras Tuner, a library specifically designed for hyperparameter tuning in Keras models. It automates the search process using multiple search algorithms.
from kerastuner import HyperModel, RandomSearch
# Define a hypermodel for tuning
class MyHyperModel(HyperModel):
def build(self, hp):
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(hp.Int('units', min_value=128, max_value=512, step=128), activation='relu'),
layers.Dense(10, activation='softmax')
])
model.compile(optimizer=hp.Choice('optimizer', ['adam', 'rmsprop']), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
# Initialize Keras Tuner
tuner = RandomSearch(MyHyperModel(), objective='val_accuracy', max_trials=5, executions_per_trial=3, directory='my_dir', project_name='mnist')
# Run the tuner search
tuner.search(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
# Print the best hyperparameters
best_hps = tuner.get_best_hyperparameters()[0]
print(f"Best units: {best_hps.get('units')}, Best optimizer: {best_hps.get('optimizer')}")
Output
Best units: 256, Best optimizer: adam
Objective: Learn various hyperparameter tuning techniques to optimize the performance of deep learning models.
Skills Developed:
Manual hyperparameter tuning for learning rates, batch sizes, and optimizers.
Systematic tuning with grid search and random search.
Advanced tuning techniques using Bayesian optimization and automated tools like Keras Tuner.
Tools: TensorFlow, Keras, GridSearchCV, RandomizedSearchCV, Hyperopt, Keras Tuner.
These 10 examples in Week 6 provide students with a comprehensive understanding of hyperparameter tuning and the tools needed to improve the performance of their models. By experimenting with different techniques, they gain practical insights into the importance of hyperparameter optimization in deep learning.