Advanced Applied Deep Learning
Lecture Course
Sheng Yun Wu
Lecture Course
Sheng Yun Wu
Objective:
To introduce students to the concept of hyperparameter tuning and its importance in training deep learning models. Students will learn about the key hyperparameters in convolutional neural networks (CNNs), how to perform hyperparameter tuning using methods like grid search and random search, and will explore the impact of different hyperparameters on model performance. By the end of the week, students should be able to systematically tune hyperparameters for optimal model performance.
Lecture 1: Introduction to Hyperparameters and Their Role
6.1 What are Hyperparameters?
Definition:
Hyperparameters are the parameters set before the learning process begins, which control the learning behavior of the model.
Difference Between Parameters and Hyperparameters:
Parameters: Values learned during training (e.g., weights and biases).
Hyperparameters: Values set manually before training (e.g., learning rate, batch size).
Key Hyperparameters in CNNs:
Learning Rate: Controls the size of weight updates during training.
Batch Size: Number of samples processed before the model’s weights are updated.
Number of Epochs: Number of times the entire dataset is passed through the model during training.
Number of Filters: The number of filters in each convolutional layer.
Kernel Size: The size of the filters applied in convolutional layers.
Dropout Rate: Percentage of neurons dropped during training to prevent overfitting.
Lecture 2: Common Methods for Hyperparameter Tuning
6.2 The Need for Hyperparameter Tuning:
Why Tuning is Important:
Hyperparameters significantly affect the performance of the model.
Properly tuned hyperparameters can lead to faster convergence and higher accuracy.
Improperly set hyperparameters can lead to overfitting, underfitting, or slow learning.
Challenges in Hyperparameter Tuning:
High-dimensional hyperparameter space.
Time-consuming to search through all possible combinations.
6.3 Hyperparameter Tuning Methods:
Grid Search:
A brute-force approach that tests all possible combinations of hyperparameter values.
Pros: Exhaustive search over the hyperparameter space.
Cons: Time-consuming and computationally expensive, especially with many hyperparameters.
Random Search:
Randomly samples combinations of hyperparameter values from a predefined range.
Pros: More efficient than grid search for high-dimensional spaces, often achieves good results with fewer trials.
Cons: Still requires a lot of computation but offers better efficiency than grid search.
Bayesian Optimization (Optional):
A more advanced approach that uses probabilistic models to find the best hyperparameter values based on past trials.
Pros: More efficient than both grid search and random search.
Cons: More complex to implement.
Manual Search:
Manually adjusting one or two hyperparameters based on intuition or experience.
Pros: Simple to implement.
Cons: Often suboptimal, as it can miss key combinations of hyperparameters.
Lecture 3: Key Hyperparameters to Tune in CNNs
6.4 Learning Rate and Batch Size
Learning Rate:
One of the most important hyperparameters in deep learning.
A small learning rate can result in slow training, while a large learning rate can cause the model to overshoot and not converge.
Batch Size:
The number of training samples used in one forward/backward pass.
Larger batch sizes offer more stable updates but require more memory, while smaller batch sizes lead to more noisy updates but may generalize better.
6.5 Number of Filters and Kernel Size in Convolutional Layers
Number of Filters:
Determines how many features the network learns at each layer.
More filters can capture more complex patterns but increase computational costs.
Kernel Size:
The size of the convolutional filters (e.g., 3x3, 5x5).
Larger kernels capture broader features but are more computationally expensive, while smaller kernels focus on finer details.
6.6 Dropout Rate and Regularization
Dropout Rate:
Determines how many neurons to drop during each training step.
A higher dropout rate reduces overfitting but can make the model underperform if too high.
Regularization Methods (L2, L1):
Helps to penalize complex models by adding a regularization term to the loss function.
Practical Session: Hyperparameter Tuning in CNNs
Objective: Implement hyperparameter tuning using grid search and random search to optimize the performance of a CNN model.
Dataset: CIFAR-10 or MNIST dataset.
Key Steps:
Step 1: Build a Baseline CNN Model
Construct a simple CNN architecture (e.g., two convolutional layers with a fully connected layer at the end) for image classification.
Step 2: Select Hyperparameters to Tune
Choose hyperparameters like learning rate, batch size, number of filters, and dropout rate to tune.
Step 3: Perform Grid Search
Use a tool like KerasTuner or Scikit-learn’s GridSearchCV to perform grid search.
Define a grid of hyperparameter values to search (e.g., learning rates [0.001, 0.01, 0.1], batch sizes [32, 64, 128], etc.).
Train the model for each combination and evaluate its performance on the validation set.
Step 4: Perform Random Search
Use KerasTuner or RandomSearchCV for random search.
Randomly sample from a range of hyperparameter values (e.g., learning rates between 0.001 and 0.1, batch sizes between 32 and 128).
Step 5: Compare the Results
Evaluate the best model from each search method in terms of accuracy, training time, and validation performance.
Plot the validation accuracy/loss for each trial to identify trends.
Assignment for Week 6:
Coding Assignment:
Perform hyperparameter tuning on the CNN model using both grid search and random search.
Try tuning the following hyperparameters:
Learning rate
Batch size
Number of filters in convolutional layers
Dropout rate
Number of epochs
Use KerasTuner or a similar framework to automate the search process.
Analysis:
Compare the performance of the models trained using grid search and random search.
Which hyperparameters had the greatest impact on performance?
Analyze how different learning rates and batch sizes affect model training and accuracy.
Reading Assignment:
Read Chapter 7 of "Advanced Applied Deep Learning" by Umberto Michelucci.
Focus on understanding the impact of different hyperparameters on model performance and how to efficiently tune them.
Summary of Key Concepts:
Hyperparameter tuning: The process of optimizing hyperparameters for better model performance.
Grid search: An exhaustive search of all hyperparameter combinations.
Random search: A more efficient method of sampling random hyperparameter combinations.
Key hyperparameters: Learning rate, batch size, number of filters, kernel size, dropout rate, and regularization.
This week provides students with practical experience in hyperparameter tuning, a critical aspect of optimizing deep learning models. They will learn how to automate the tuning process and select optimal hyperparameters for their CNNs to achieve better performance.