Advanced Applied Deep Learning
First Semester Lecture Course
Sheng Yun Wu
First Semester Lecture Course
Sheng Yun Wu
Objective:
To introduce students to the inner workings of convolutional neural networks (CNNs), focusing on understanding how CNN layers function, including convolutional layers, pooling layers, and activation functions. By the end of the week, students will understand how CNNs learn spatial hierarchies in images and will be able to build a basic CNN for image classification.
Lecture 1: Understanding Convolutional Layers
2.1 Convolution Operation in CNNs
What is a convolution?
Definition of convolution in the context of image processing.
Convolutional filters (kernels): What they are and how they extract features (e.g., edges, textures) from images.
How the filter slides across the image (stride) to create a feature map.
Importance of parameters like stride and padding:
Stride: How far the filter moves after each operation.
Padding: Adding pixels around the image to preserve spatial dimensions.
Feature Maps:
Explanation of how convolutional layers transform the input image into feature maps.
How different filters learn to detect different types of patterns (e.g., edges, corners, textures).
Visualizing feature maps from different layers of CNNs.
2.2 Mathematical Foundations of Convolution
Formula for convolution operation
Where I(x,y) is the input image and K(m,n) is the kernel.
Matrix multiplication behind convolutions.
Lecture 2: Pooling Layers and Activation Functions
2.3 Pooling Layers
Purpose of Pooling:
Pooling layers help reduce the dimensionality of feature maps while retaining important information.
Max Pooling vs. Average Pooling: How they differ and when to use each.
How pooling layers make CNNs more computationally efficient by down-sampling the feature maps.
Example of max pooling: Taking the maximum value from a set of pixels (e.g., a 2x2 block).
Impact on Spatial Dimensions:
How pooling reduces the spatial dimensions of the feature maps.
Example: Reducing a 4x4 matrix to a 2x2 matrix using max pooling.
2.4 Activation Functions in CNNs
Purpose of Activation Functions:
Why activation functions are necessary: Introducing non-linearity into the model.
Common Activation Functions:
ReLU (Rectified Linear Unit): Why it is widely used in CNNs
Leaky ReLU: Handling negative inputs in ReLU.
Sigmoid and Tanh: How they compare to ReLU and their common use cases.
How activation functions are applied after each convolution operation to enhance learning.
Practical Session: Building a Simple CNN for Image Classification
Objective: Implement a basic CNN using Keras or TensorFlow for image classification tasks.
Dataset: CIFAR-10 (or MNIST for simpler tasks)
CIFAR-10 is a collection of 60,000 32x32 color images in 10 classes, with 6,000 images per class.
Key Steps:
Step 1: Data Preprocessing
Load the CIFAR-10 dataset.
Normalize the pixel values to be between 0 and 1.
Step 2: Building the CNN Architecture
Input layer: Define the input shape for images (e.g., 32x32x3 for RGB images).
Add a convolutional layer with a specified number of filters (e.g., 32 filters) and a filter size (e.g., 3x3).
Add a ReLU activation function after each convolutional layer.
Add a max-pooling layer to reduce the spatial dimensions.
Repeat the process: Add another convolutional layer followed by max pooling.
Finally, add a fully connected (dense) layer and an output layer with a softmax activation function for classification.
Step 3: Compiling and Training the Model
Define a loss function (e.g., categorical crossentropy for multi-class classification).
Define an optimizer (e.g., Adam).
Compile the model and train it using the training dataset.
Step 4: Evaluating the Model
Evaluate the trained model on the test dataset.
Calculate accuracy and loss.
Step 5: Visualizing Feature Maps
Use techniques to visualize the feature maps after each convolutional layer to understand what the network is learning at each stage.
Assignment for Week 2:
Coding Assignment:
Modify the CNN built in class by experimenting with:
Different filter sizes (e.g., 5x5, 7x7).
Different numbers of filters (e.g., 16, 64).
Different pooling sizes for max pooling.
Different activation functions (ReLU vs. Leaky ReLU).
Analysis:
Observe how changing these parameters affects model performance (accuracy and training time).
Reading Assignment:
Read Chapter 3 of "Advanced Applied Deep Learning" by Umberto Michelucci.
Focus on understanding how CNNs handle spatial data and how each layer contributes to the learning process.
Summary of Key Concepts:
The convolution operation and how CNNs extract spatial hierarchies from images.
Pooling layers for dimensionality reduction and computational efficiency.
Importance of activation functions to introduce non-linearity.
Building a basic CNN using TensorFlow/Keras for image classification.
This week provides students with a deeper understanding of CNN architecture and hands-on experience building simple CNN models, setting the stage for more complex models and object detection tasks in subsequent weeks.