GOAL : Learn Pytorch specific datastructre Tensor, and Neural Networks in pytorch.
Learning experience: In this homework, I learned that PyTorch is a powerful tool in machine learning and deep learning. During the practical exercises, I also realized the importance of software versions. Simply installing the software does not guarantee everything will work perfectly. I encountered issues with versions and CUDA that took me almost an hour to resolve. This was a valuable experience. Additionally, I learned how to implement datasets myself and observed the differences in training time between CPU and GPU, with GPU training being significantly faster. This homework deepened my understanding of PyTorch and neural networks applications and improved my skills in dealing with software versions and environment configurations.
20.0 Introduction
Just as NumPy is a foundational tool for data manipulation in the machine learning stack, PyTorch is a foundational tool for working with tensors in the deep learning stack. Before moving on to deep learning itself, we should familiarize ourselves with PyTorch tensors and create many operations analogous to those performed with NumPy in Chapter 1.
Although PyTorch is just one of multiple deep learning libraries, it is significantly popular both within academia and industry. PyTorch tensors are very similar to NumPy arrays. However, they also allow us to perform tensor operations on GPUs (hardware specialized for deep learning). In this chapter, we’ll familiarize ourselves with the basics of PyTorch tensors and many common low-level operations.
What is PyTorch ? [2]
PyTorch is a machine learning library based on the Torch library,[4][5][6] used for applications such as computer vision and natural language processing.[7]
PyTorch provides two high-level features:[20]
Tensor computing (like NumPy) with strong acceleration via graphics processing units (GPU).
Deep neural networks built on a tape-based automatic differentiation system.
What is Tensors ?
Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters. [3]
Definition
Let 𝐹 be a field such as the real numbers 𝑅 or the complex numbers 𝐶. A tensor 𝐴 is an 𝐼1×𝐼2×⋯×𝐼𝐶 array over 𝐹:
Here, 𝐶 and 𝐼1,𝐼2,…,𝐼𝐶 are positive integers, and 𝐶 is the number of dimensions, number of ways, or mode of the tensor.[5]
20.1 Creating a Tensor
This section will create a tensor.
In 4 uses the PyTorch library to create vectors.
First, it imports the PyTorch library with import torch.
Next, it creates a vector as a row named tensor_row.
This vector contains the numbers 1, 2, and 3, and is created using torch.tensor([1, 2, 3]).
Then, it creates a vector as a column named tensor_column.
This vector contains the numbers 1, 2, and 3, but each number is in its own row, and is created using torch.tensor([[1], [2], [3]]).
We can see the result In 7 & 8, which we have created.
Note : pip install torch torchvision torchaudio
20.2 Creating a Tensor from NumPy
This section will create PyTorch tensors from NumPy arrays.
In 9 uses the NumPy and PyTorch libraries to create arrays and tensors.
First, it imports the NumPy and PyTorch libraries with import numpy as np and import torch.
Next, it creates a NumPy array named vector_row.
This array contains the numbers 1, 2, and 3, and is created using np.array([1, 2, 3]).
Then, it creates a tensor from this NumPy array, named tensor_row.
Using torch.from_numpy(vector_row), it converts vector_row into a PyTorch tensor.
In 10 & 11, we can see the different which is an array and the other is tensor.
20.3 Creating a Sparse Tensor
This section will given data with very few nonzero values, you want to efficiently represent it with a tensor.
In 12 uses the PyTorch library to create a tensor and a sparse tensor.
First, it imports the PyTorch library with import torch.
Next, it creates a tensor named tensor.
This tensor is a 3x2 matrix containing the numbers [0, 0], [0, 1], and [3, 0], created using torch.tensor([...]).
Then, it creates a sparse tensor from this regular tensor, named sparse_tensor.
Using tensor.to_sparse(), it converts tensor into PyTorch's sparse tensor format.
In 13 we can see the sparse tensor which represent with indices and values.
In 14 we can see they’re actually both of the same class.
20.4 Selecting Elements in a Tensor
This section will select specific elements of a tensor.
In 15 uses the PyTorch library to create a vector tensor and a matrix tensor, and selects a specific element from the vector.
First, it imports the PyTorch library with import torch.
Next, it creates a vector tensor named vector.
This vector contains the numbers 1, 2, 3, 4, 5, and 6, created using torch.tensor([1, 2, 3, 4, 5, 6]).
Then, it creates a matrix tensor named matrix.
This matrix is a 3x3 matrix containing the numbers [1, 2, 3], [4, 5, 6], and [7, 8, 9], created using torch.tensor([...]).
Finally, it selects the third element of the vector, which is the element at index 2, using vector[2].
In 23 we also find another tensor(6).
In 24 shows that we can see the tensor select from matrix.
In 27 ~ 32 like NumPy arrays and most everything in Python, PyTorch tensors are zero-indexed. Both indexing and slicing are supported as well. One key difference is that indexing a PyTorch tensor to return a single element still returns a tensor as opposed to the value of the object itself (which would be in the form of an integer or float). Slicing syntax also has parity with NumPy and will return objects of type tensor in PyTorch.
One key difference is that PyTorch tensors do not yet support negative steps when slicing. Therefore, attempting to reverse a tensor using slicing yields an error In 33.
Instead, if we wish to reverse a tensor we can use the flip method In 34.
20.5 Describing a Tensor
This section will describe the shape, data type, and format of a tensor along with the hardware it’s using.
In 35 ~ 38 Inpect the shape, dtype, layout, and device attributes of the tensor.
PyTorch tensors provide a number of helpful attributes for gathering information about a given tensor, including:
Shape : Returns the dimensions of the tensor
Dtype : Returns the data type of objects within the tensor
Layout : Returns the memory layout (most common is strided used for dense tensors)
Device : Returns the hardware the tensor is being stored on (CPU/GPU)
Again, the key differentiator between tensors and arrays is an attribute like device, because tensors provide us with hardware-accelerated options like GPUs.
20.6 Applying Operations to Elements
This section will apply an operation to all elements in a tensor.
In 39 we take advantage of broadcasting with PyTorch.
Basic operations in PyTorch will take advantage of broadcasting to parallelize them using accelerated hardware such as GPUs. This is true for supported mathematical operators in Python (+, -, ×, /) and other functions inherent to PyTorch. Unlike NumPy, PyTorch doesn’t include a vectorize method for applying a function over all elements in a tensor. However, PyTorch comes equipped with all of the mathematical tools necessary to distribute and accelerate the usual operations required for deep learning workflows.
20.7 Finding the Maximum and Minimum Values
This section will find the maximum or minimum value in a tensor.
In 40 use the PyTorch max and min methods. we can find it easy.
The max and min methods of a tensor help us find the largest or smallest values in that tensor. These methods work the same across multidimensional tensors as well In 42
20.8 Reshaping Tensors
This section will change the shape (number of rows and columns) of a tensor without changing the element values.
In 43 we easily use the PyTorch reshape method.
Discussion
Manipulating the shape of a tensor can be common in the field of deep learning, as neurons in a neural network often require tensors of a very specific shape. Since the required shape of a tensor can change between neurons in a given neural network, it is good to have a low-level understanding of our inputs and outputs in deep learning.
20.9 Transposing a Tensor
This section will transpose a tensor.
In 44 we easily use the mT method.
Discussion
Transposing with PyTorch is slightly different from NumPy. The T method used for NumPy arrays is supported in PyTorch only with tensors of two dimensions and at the time of writing is deprecated for tensors of other shapes. The mT method used to transpose batches of tensors is preferred, as it scales to greater than two dimensions.
In 45 an additional way to transpose PyTorch tensors of any shape is to use the permute method. This method also works for one-dimensional tensors (for which the value of the tranposed tensor is the same as the original tensor).
20.10 Flattening a Tensor
This section will transform a tensor into one dimension.
In 46 use the flatten method to flat the tensor.
Discussion
Flattening a tensor is a useful technique for reducing a multidimensional tensor into one dimension.
20.11 Calculating Dot Products
This section will calculate the dot product of two tensors.
In 47 we can use the dot method to calculate.
Discussion
Calculating the dot product of two tensors is a common operation useful in the deep learning space as well as the information retrieval space. You may remember earlier in the book where we used the dot product of two vectors to perform a cosine similarity-based search. Doing this in PyTorch on GPU (instead of with NumPy or scikit-learn on CPU) can yield impressive performance benefits on information retrieval problems.
20.12 Multiplying Tensors
This section will multiply two tensors.
In 49 use basic Python arithmetic operators, we can multiply two tensors.
In 50~52 PyTorch supports basic arithmetic operators such as ×, +, - and /. Although multiplying tensors is probably one of the most common operations used in deep learning, it’s useful to know tensors can also be added, subtracted, and divided.
21.0 Introduction
At the heart of basic neural networks is the unit (also called a node or neuron). A unit takes in one or more inputs, multiplies each input by a parameter (also called a weight), sums the weighted input’s values along with some bias value (typically 0), and then feeds the value into an activation function. This output is then sent forward to the other neurons deeper in the neural network (if they exist).
Neural networks can be visualized as a series of connected layers that form a network connecting an observation’s feature values at one end and the target value (e.g., observation’s class) at the other end. Feedforward neural networks—also called multilayer perceptron—are the simplest artificial neural networks used in any real-world setting. The name “feedforward” comes from the fact that an observation’s feature values are fed “forward” through the network, with each layer successively transforming the feature values with the goal that the output is the same as (or close to) the target’s value.
Specifically, feedforward neural networks contain three types of layers. At the start of the neural network is an input layer, where each unit contains an observation’s value for a single feature. For example, if an observation has 100 features, the input layer has 100 units. At the end of the neural network is the output layer, which transforms the output of intermediate layers (called hidden layers) into values useful for the task at hand. For example, if our goal is binary classification, we can use an output layer with a single unit that uses a sigmoid function to scale its own output to between 0 and 1, representing a predicted class probability.
Between the input and output layers are the so-called hidden layers. These hidden layers successively transform the feature values from the input layer to something that, once processed by the output layer, resembles the target class. Neural networks with many hidden layers (e.g., 10, 100, 1,000) are considered “deep” networks. Training deep neural networks is a process known as deep learning.
Neural networks are typically created with all parameters initialized as small random values from a Gaussian or normal uniform distribution. Once an observation (or more often a set number of observations called a batch) is fed through the network, the outputted value is compared with the observation’s true value using a loss function. This is called forward propagation. Next an algorithm goes “backward” through the network identifying how much each parameter contributed to the error between the predicted and true values, a process called back propagation. At each parameter, the optimization algorithm determines how much each weight should be adjusted to improve the output.
Neural networks learn by repeating this process of forward propagation and back propagation for every observation in the training data multiple times (each time all observations have been sent through the network is called an epoch and training typically consists of multiple epochs), iteratively updating the values of the parameters utilizing a process called gradient descent to slowly optimize the values of the parameters for the given output.
In this chapter, we will use the same Python library used in the last chapter, PyTorch, to build, train, and evaluate a variety of neural networks. PyTorch is a popular tool within the deep learning space due to its well-written APIs and intuitive representation of the low-level tensor operations that power neural networks. One key feature of PyTorch is called autograd, which automatically computes and stores the gradients used to optimize the parameters of the network after undergoing forward propagation and back propagation.
Neural networks created using PyTorch code can be trained using both CPUs (i.e., on your laptop) and GPUs (i.e., on a specialized deep learning computer). In the real world with real data, it is often necessary to train neural networks using GPUs, as the training process on large data for complex networks runs orders of magnitude faster on GPUs than CPUs. However, all the neural networks in this book are small and simple enough to be trained on a CPU-only laptop in only a few minutes. Just be aware that when we have larger networks and more training data, training using CPUs is significantly slower than training using GPUs.
The upper figure is a simple neuron which can be represent by below formula. [5]
which a1~an are the components of the input vector
w1~wn are the weight values (weight) of each synapse of the neuron
b is bias (bias)
f is the transfer function, usually a nonlinear function. Generally there are traingd(), tansig(), hardlim(). The following defaults to hardlim()
t is the neuron output
And the formula
The vector 𝑊⃗ represents the weight vector,
𝑊′⃗, represents the transpose of 𝑊⃗ ,
𝐴⃗ represents the input vector,
𝑏 represents the bias,
𝑓 represents the activation function.
21.1 Using Autograd with PyTorch
This section will use PyTorch’s autograd features to compute and store the gradients after undergoing forward propagation and back propagation.
In 1 demonstrates automatic differentiation of tensors in PyTorch.
First, we import the PyTorch library.
We create a tensor t containing values [1.0, 2.0, 3.0] and set requires_grad=True, indicating that we want to track gradients of this tensor.
Next, we perform a tensor operation simulating "forward propagation". We calculate the sum of t and store the result in tensor_sum.
Then, we execute "backward propagation". By calling the backward() method, PyTorch computes the gradient of tensor_sum with respect to t.
Finally, we view the gradients of t, which can be obtained using t.grad.
Discussion
Autograd is one of the core features of PyTorch and a big factor in its popularity as a deep learning library. The ability to easily compute, store, and visualize gradients makes PyTorch very intuitive for researchers and enthusiasts building neural networks from scratch.
PyTorch uses a directed acyclic graph (DAG) to keep a record of all data and computational operations being performed on that data. This is incredibly useful, but it also means we need to be careful with what operations we try to apply on our PyTorch data that requires gradients.
In 2 When working with autograd, we can’t easily convert our tensors to NumPy arrays and back without “breaking the graph,” a phrase used to describe operations that don’t support autograd
In 4 To convert this tensor into a NumPy array, we need to call the detach() method on it, which will break the graph and thus our ability to automatically compute gradients. While this can definitely be useful, it’s worth knowing that detaching the tensor will prevent PyTorch from automatically computing the gradient.
21.2 Preprocessing Data for Neural Networks
This section will preprocess data for use in a neural network.
In 5 demonstrates how to standardize features using the StandardScaler from the Scikit-learn library and then convert them into PyTorch tensors.
First, we import the necessary libraries, including the preprocessing module from Scikit-learn and numpy.
We create a numpy array named features containing some feature values. These features could represent a dummy dataset, where each row represents a sample and each column represents a feature.
Next, we create a scaler named scaler that will be used to standardize the features. Standardization aims to make the mean of the features 0 and the variance 1.
Then, we use the torch.from_numpy() method to convert the numpy array into a PyTorch tensor named features_standardized_tensor. This allows us to further process this data in PyTorch.
Finally, we display the transformed feature tensor features_standardized_tensor.
Discussion
While this recipe is very similar to Recipe 4.2, it is worth repeating because of how important it is for neural networks. Typically, a neural network’s parameters are initialized (i.e., created) as small random numbers. Neural networks often behave poorly when the feature values are much larger than the parameter values. Furthermore, since an observation’s feature values are combined as they pass through individual units, it is important that all features have the same scale.
For these reasons, it is best practice (although not always necessary; for example, when we have all binary features) to standardize each feature such that the feature’s values have the mean of 0 and the standard deviation of 1. This can be accomplished easily with scikit-learn’s StandardScaler.
In 6 however, if you need to perform this operation after having created tensors with requires_grad=True, you’ll need to do this natively in PyTorch, so as not to break the graph. While you’ll typically standardize features prior to starting to train the network, it’s worth knowing how to accomplish the same thing in PyTorch.
21.3 Designing a Neural Network
This section will design a neural network.
In 7 defines a simple neural network and initializes its parameters, including the loss function and optimizer.
First, we import the torch and torch.nn modules from PyTorch.
Next, we define a class named SimpleNeuralNet that inherits from nn.Module, which is the base class for all neural network models in PyTorch. In the __init__() method of the SimpleNeuralNet class, we define three fully connected layers (nn.Linear), namely fc1, fc2, and fc3, specifying their input and output dimensions.
In the forward() method, we define the forward propagation process of the neural network. We use ReLU activation function (nn.functional.relu) to pass the input through each fully connected layer, and finally use the Sigmoid activation function (nn.functional.sigmoid) to transform the final output into probability values between 0 and 1.
Next, we initialize an instance of the neural network called network.
We define the loss function loss_criterion as Binary Cross Entropy loss (nn.BCELoss), which is used to compute the difference between the model's predictions and the actual values.
Finally, we define the optimizer optimizer as the RMSprop optimizer (torch.optim.RMSprop), which updates the parameters of the neural network based on the computed gradients.
Lastly, we display the defined neural network model network.
Discussion
Neural networks consist of layers of units. However, there’s incredible variety in the types of layers and how they are combined to form the network’s architecture. While there are commonly used architecture patterns (which we’ll cover in this chapter), the truth is that selecting the right architecture is mostly an art and the topic of much research.
To construct a feedforward neural network in PyTorch, we need to make a number of choices about both the network architecture and training process. Remember that each unit in the hidden layers:
Receives a number of inputs.
Weights each input by a parameter value.
Sums together all weighted inputs along with some bias (typically 0).
Most often then applies some function (called an activation function).
Sends the output on to units in the next layer.
First, for each layer in the hidden and output layers we must define the number of units to include in the layer and the activation function. Overall, the more units we have in a layer, the more complex patterns our network is able to learn. However, more units might make our network overfit the training data in a way detrimental to the performance on the test data.
For hidden layers, a popular activation function is the rectified linear unit (ReLU):
F(z) = max ( 0 , z )
where z is the sum of the weighted inputs and bias. As we can see, if z is greater than 0, the activation function returns z; otherwise, the function returns 0. This simple activation function has a number of desirable properties (a discussion of which is beyond the scope of this book), and this has made it a popular choice in neural networks. We should be aware, however, that many dozens of activation functions exist.
Second, we need to define the number of hidden layers to use in the network. More layers allow the network to learn more complex relationships, but with a computational cost.
Third, we have to define the structure of the activation function (if any) of the output layer. The nature of the output function is often determined by the goal of the network. Here are some common output layer patterns:
Binary classification : One unit with a sigmoid activation function
Multiclass classification : k units (where k is the number of target classes) and a softmax activation function
Regression : One unit with no activation function
Fourth, we need to define a loss function (the function that measures how well a predicted value matches the true value); again, this is often determined by the problem type:
Binary classification : Binary cross-entropy
Multiclass classification : Categorical cross-entropy
Regression : Mean square error
Fifth, we have to define an optimizer, which intuitively can be thought of as our strategy “walking around” the loss function to find the parameter values that produce the lowest error. Common choices for optimizers are stochastic gradient descent, stochastic gradient descent with momentum, root mean square propagation, and adaptive moment estimation.
Sixth, we can select one or more metrics to use to evaluate the performance, such as accuracy.
In 8 , we use the torch.nn.Module namespace to compose a simple, sequential neural network that can make binary classifications. The standard PyTorch approach for this is to create a child class that inherits from the torch.nn.Module class, instantiating a network architecture in the __init__ method, and defining the mathematical operations we want to perform upon each forward pass in the forward method of the class. There are many ways to define networks in PyTorch, and although in this case we use functional methods for our activation functions (such as nn.functional.relu) we can also define these activation functions as layers. If we wanted to compose everything in the network as a layer, we could use the Sequential class.
In both cases, the network itself is a two-layer neural network (when counting layers we don’t include the input layer because it does not have any parameters to learn) defined using PyTorch’s sequential model. Each layer is “dense” (also called “fully connected”), meaning that all the units in the previous layer are connected to all the units in the next layer.
In the first hidden layer we set out_features=16, meaning that layer contains 16 units. These units have ReLU activation functions as defined in the forward method of our class: x = nn.functional.relu(self.fc1(x)). The first layer of our network has the size (10, 16), which tells the first layer to expect each observation from our input data to have 10 feature values. This network is designed for binary classification so the output layer contains only one unit with a sigmoid activation function, which constrains the output to between 0 and 1 (representing the probability an observation is class 1).
21.4 Training a Binary Classifier
This section will train a binary classifier neural network.
In 11 demonstrates how to create, train, and evaluate a simple neural network model using PyTorch.
We start by importing the necessary libraries, including torch, torch.nn, torch.utils.data, torch.optim from PyTorch, and make_classification and train_test_split from Scikit-learn.
We generate a synthetic binary classification dataset using Scikit-learn's make_classification function and split it into training and testing sets.
Setting random seeds ensures reproducibility of results.
The data is converted into PyTorch tensors and split into training and testing sets.
We define a simple neural network model named SimpleNeuralNet, using the Sequential container to stack multiple layers, including two fully connected layers and two ReLU activation layers, with a Sigmoid activation layer for the final output.
The neural network model is initialized, and the loss function criterion and optimizer optimizer are defined.
A data loader train_loader is defined to iterate over batches of the training data during training.
Using a new feature introduced in PyTorch 2.0, the model is compiled using torch.compile().
The model is trained over multiple epochs, with each epoch iterating over mini-batches of the training data, computing and updating the model parameters.
Finally, the trained model is evaluated on the testing set, calculating the loss and accuracy on the test data.
Discussion
In Recipe 21.3, we discussed how to construct a neural network using PyTorch’s sequential model. In this recipe we train that neural network using 10 features and 1,000 observations of fake classification generated from scikit-learn’s make_classification function.
The neural network we are using is the same as the one in Recipe 21.3 (see that recipe for a detailed explanation). The difference there is that we only created the neural network; we didn’t train it.
At the end, we use with torch.no_grad() to evaluate the network. This says that we should not compute gradients for any tensor operations conducted in this section of code. Since we use gradients only during the model training process, we don’t want to store new gradients for operations that occur outside of it (such as prediction or evaluation).
The epochs variable defines how many epochs to use when training the data. batch_size sets the number of observations to propagate through the network before updating the parameters.
We then iterate over the number of epochs, making forward passes through the network using the forward method, and then backward passes to update the gradients. The result is a trained model.
21.5 Training a Multiclass Classifier
This section will train a multiclass classifier neural network.
In 16 demonstrates the process of creating, training, and evaluating a simple neural network model for multi-class classification using PyTorch.
We start by importing the required libraries, including torch, torch.nn, torch.utils.data, torch.optim, make_classification, and train_test_split from Scikit-learn.
We generate a synthetic multi-class classification dataset using Scikit-learn's make_classification function and split it into training and testing sets.
Setting random seeds ensures reproducibility of results.
The data is converted into PyTorch tensors, and for multi-class classification, the target variable is converted into one-hot encoded format using torch.nn.functional.one_hot.
We define a simple neural network model named SimpleNeuralNet, using the Sequential container to stack multiple layers, including two fully connected layers and two ReLU activation layers, with a Softmax activation layer for the final output.
The neural network model is initialized, and the loss function criterion and optimizer optimizer are defined.
A data loader train_loader is defined to iterate over batches of the training data during training.
Using a new feature introduced in PyTorch 2.0, the model is compiled using torch.compile().
The model is trained over multiple epochs, with each epoch iterating over mini-batches of the training data, computing and updating the model parameters.
Finally, the trained model is evaluated on the testing set, calculating the loss and accuracy on the test data.
Discussion
In this solution we created a similar neural network to the binary classifier from the last recipe, but with some notable changes. In the classification data we generated, we set N_CLASSES=3. To handle multiclass classification, we also use nn.CrossEntropyLoss(), which expects the target to be one-hot encoded. To accomplish this, we use the torch.nn.functional.one_hot function and end up with a one-hot encoded array where the position of 1. indicates the class for a given observation.
In 17 since this is a multiclass classification problem, we used an output layer of size 3 (one per class) containing a softmax activation function. The softmax activation function will return an array of 3 values summing to 1. These 3 values represent an observation’s probability of being a member of each of the 3 classes.
As mentioned in this recipe, we used a loss function suited to multiclass classification, the categorical cross-entropy loss function: nn.CrossEntropyLoss().
21.6 Training a Regressor
This section will train a neural network for regression.
In 18 demonstrates the process of creating, training, and evaluating a simple neural network model for regression tasks using PyTorch.
We start by importing the necessary libraries, including torch, torch.nn, torch.utils.data, torch.optim, make_regression, and train_test_split from Scikit-learn.
A synthetic regression dataset is generated using Scikit-learn's make_regression function, and it is split into training and testing sets using train_test_split.
Random seeds are set to ensure reproducibility of results.
The data is converted into PyTorch tensors, and it is split into training and testing sets.
We define a simple neural network model named SimpleNeuralNet using the Sequential container to stack multiple layers, including two fully connected layers and two ReLU activation layers. The final layer is a linear layer with a single output.
The neural network model is initialized, and the loss function criterion and optimizer optimizer are defined.
A data loader train_loader is defined to iterate over batches of the training data during training.
Using a new feature introduced in PyTorch 2.0, the model is compiled using torch.compile().
The model is trained over multiple epochs, with each epoch iterating over mini-batches of the training data, computing and updating the model parameters.
Finally, the trained model is evaluated on the testing set, and the mean squared error (MSE) is calculated as the evaluation metric.
Discussion
It’s completely possible to create a neural network to predict continuous values instead of class probabilities. In the case of our binary classifier (Recipe 21.4) we used an output layer with a single unit and a sigmoid activation function to produce a probability that an observation was class 1. Importantly, the sigmoid activation function constrained the outputted value to between 0 and 1. If we remove that constraint by having no activation function, we allow the output to be a continuous value.
Furthermore, because we are training a regression, we should use an appropriate loss function and evaluation metric, in our case the mean square error:
where n is the number of observations; yi is the true value of the target we are trying to predict, y, for observation i; and yi^ is the model’s predicted value for yi.
Finally, because we are using simulated data using scikit-learn make_regression, we didn’t have to standardize the features. It should be noted, however, that in almost all real-world cases, standardization would be necessary.
21.7 Making Predictions
This section will use a neural network to make predictions.
In 19 demonstrates the process of creating, training, and evaluating a simple neural network model for binary classification tasks using PyTorch.
We start by importing the necessary libraries, including torch, torch.nn, torch.utils.data, torch.optim, make_classification, and train_test_split from Scikit-learn.
A synthetic binary classification dataset is generated using Scikit-learn's make_classification function, and it is split into training and testing sets using train_test_split.
Random seeds are set to ensure reproducibility of results.
The data is converted into PyTorch tensors, and it is split into training and testing sets.
We define a simple neural network model named SimpleNeuralNet using the Sequential container to stack multiple layers, including two fully connected layers and two ReLU activation layers. The final layer is a linear layer with a sigmoid activation function.
The neural network model is initialized, and the loss function criterion and optimizer optimizer are defined.
A data loader train_loader is defined to iterate over batches of the training data during training.
Using a new feature introduced in PyTorch 2.0, the model is compiled using torch.compile().
The model is trained over multiple epochs, with each epoch iterating over mini-batches of the training data, computing and updating the model parameters.
Finally, we evaluate the trained model on the testing set. We use a context manager torch.no_grad() to disable gradient calculation during inference, compute the predicted class labels for the training data, and print the predicted class label for the first sample.
Discussion
Making predictions is easy in PyTorch. Once we have trained our neural network we can use the forward method (already used as part of the training process), which takes as input a set of features and does a forward pass through the network. In our solution the neural network is set up for binary classification, so the predicted output is the probability of being class 1. Observations with predicted values very close to 1 are highly likely to be class 1, while observations with predicted values very close to 0 are highly likely to be class 0. Hence, we use the round method to convert these values to 1s and 0s for our binary classifier.
First introduce Olivetti faces data-set
This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge. The sklearn.datasets.fetch_olivetti_faces function is the data fetching / caching function that downloads the data archive from AT&T.
As described on the original website:
There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).
Data Set Characteristics:
Classes : 40
Samples total : 400
Dimensionality : 4096
Features : real, between 0 and 1
The image is quantized to 256 grey levels and stored as unsigned 8-bit integers; the loader will convert these to floating point values on the interval [0, 1], which are easier to work with for many algorithms.
The “target” for this database is an integer from 0 to 39 indicating the identity of the person pictured; however, with only 10 examples per class, this relatively small dataset is more interesting from an unsupervised or semi-supervised perspective.
The original dataset consisted of 92 x 112, while the version available here consists of 64x64 images.
When using these images, please give credit to AT&T Laboratories Cambridge. [12]
Below figure shows the 40 people's face from the dataset.
Next we plot the first 10 images for the first two persons (face id=0 and face id=1) from the Olivetti Faces dataset.
In 6 demonstrates building, training, and evaluating a simple neural network model using PyTorch for classifying Olivetti faces dataset.
Import necessary libraries, including matplotlib, fetch_olivetti_faces for loading the Olivetti faces dataset from Scikit-learn, and PyTorch modules such as torch, torch.nn, torch.optim, torch.utils.data including TensorDataset and DataLoader.
Check for the availability of GPU and set the device to CUDA (if GPU available) or CPU.
Load the Olivetti faces dataset and split it into training and testing sets using Scikit-learn's train_test_split function.
Convert the data into PyTorch tensors and move them to the defined device (GPU or CPU).
Create training and testing data loaders using TensorDataset and DataLoader to iterate over batches of data during training.
Define a simple neural network model OlivettiClassifier consisting of three fully connected layers (also known as dense layers) with ReLU activation function.
Instantiate the model and move it to the device.
Define the loss function (here using cross-entropy loss) and optimizer (using the Adam optimizer).
Train the model. In each training epoch, pass the training data through the model for forward propagation, compute the loss, perform backpropagation, and update the parameters while calculating and storing the loss value for each epoch.
Evaluate the model's performance on the testing set. Use torch.no_grad() context manager to disable gradient computation, then make predictions on the testing data through the model, and calculate the accuracy of the model.
Finally, plot the loss curve during training using Matplotlib.
Note :
TP (True Positive) is the number of positive samples that are correctly predicted as positive.
TN (True Negative) is the number of negative samples that are correctly predicted as negative.
FP (False Positive) is the number of negative samples that are incorrectly predicted as positive.
FN (False Negative) is the number of positive samples that are incorrectly predicted as negative.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
The output shows all 100 epochs loss, when epoch=100 our loss is 0.0097
and we get the test accuracy = 0.9 = 90%
Below figure plot all the loss of this training, we can see that it is convergence.
Next we are going to improve the result. First we try more epoch, when epoch = 200, our loss is 0.0021, but the accuracy is still 90%. I think that is because this dataset are too small.
Then we try to Increase the model complexity. With epoch = 200, we get the accuracy73.75%, which is not better. And there is peak in the figure, i think that is most likely due to improper setting of the learning rate.
After we change the learning rate to 0.0001, we get more smooth curve, but accuracy still not increase.
At the end we try to add dropout layer, we have tried a lot of learning rate= [0.1 0.01 0.001 0.0001 0.00005 0.00001 ], and epoch = [200 500 1000 1500 2000], but i forgot to record, the last we tried is lr = 0.00005 and epoch = 2000, we get the accuracy is 88.75%. And we have discover that even we set the model, parameters... etc. same, we will get different accuracy, This may be the local maxima and global maxima. Furthermore, we have reach the loss almost equal to 0.0001, but its accuracy is so bad that i have not recorded, I think this is overfitting. (The middle figure)
The last two figure with lr = 0.001, epoch = 200 get the accuracy 91.25% and 93.75%
The 20 newsgroups dataset comprises around 18000 newsgroups posts on 20 topics split in two subsets: one for training (or development) and the other one for testing (or for performance evaluation). The split between the train and test set is based upon a messages posted before and after a specific date.
This module contains two loaders. The first one, sklearn.datasets.fetch_20newsgroups, returns a list of the raw texts that can be fed to text feature extractors such as CountVectorizer with custom parameters so as to extract feature vectors. The second one, sklearn.datasets.fetch_20newsgroups_vectorized, returns ready-to-use features, i.e., it is not necessary to use a feature extractor. [13]
Data Set Characteristics:
Classes : 20
Samples total : 18846
Dimensionality : 1
Features : text
Data Considerations
The Cleveland Indians is a major league baseball team based in Cleveland, Ohio, USA. In December 2020, it was reported that “After several months of discussion sparked by the death of George Floyd and a national reckoning over race and colonialism, the Cleveland Indians have decided to change their name.” Team owner Paul Dolan “did make it clear that the team will not make its informal nickname – the Tribe – its new team name.” “It’s not going to be a half-step away from the Indians,” Dolan said.”We will not have a Native American-themed name.”
In 105 implements a text classification task using the 20 Newsgroups dataset with PyTorch for training and testing a neural network. The data preprocessing includes converting text data into numerical features, splitting the data into training and test sets, converting the data into PyTorch tensors, and then training and testing the neural network.
First, the code checks for the availability of a GPU and sets the appropriate device. Next, it loads the 20 Newsgroups dataset and uses CountVectorizer to convert the text data into numerical features. The number of features is limited to 4096 to simplify the model. The data is then split into training and test sets.
After converting the data into PyTorch tensors, the code creates PyTorch DataLoaders to facilitate batch loading of the data. Each dataset (training and testing) has its own DataLoader.
Next, a simple neural network NewsClassifier is defined. This network has three fully connected layers (fc1, fc2, fc3) with input size 4096 (matching the number of features), hidden layers of 512 and 256 units, and an output layer of 20 units (corresponding to the 20 newsgroup categories). During the forward pass, the data goes through these layers sequentially with ReLU activation functions applied.
During training, cross-entropy loss and the Adam optimizer are used. In each epoch, the model performs a forward pass through the training set, computes the loss, performs a backward pass, and updates the weights. The loss value for each epoch is recorded and printed every two epochs.
After training, the model is evaluated on the test set to calculate accuracy. Finally, the loss values are plotted as a curve to show the change in loss during the training process.
Below table shows the parameters we have tried, we can also observe overfitting at lr=0.0001, and we have get the best accuracy = 86.84%.
NN fit this dataset fast and high accuracy, that is powerful.
[1] Chapter 20. Tensors with PyTorch, Machine Learning with Python - Theory and Implementation.
[2] PyTorch, Wikipedia.
[3] Tensors, © Copyright 2024, PyTorch.
[4] Chapter 21. Neural Networks, Machine Learning with Python - Theory and Implementation.
[5] 類神經網路, Wikipedia.
[6] pytorch install