In the field of image processing, especially when employing Generative Adversarial Networks (GANs), the choice of an appropriate dataset is fundamental. For our project, we selected the COCO 2017 dataset, a comprehensive and diverse collection of real-world images. This dataset, encompassing roughly 18GB of data, offers a wide range of scenes and objects, making it highly suitable for training a model focused on noise reduction in real scene images. The diversity of the COCO dataset is a key attribute, as it includes various image types, from urban landscapes to natural scenes, providing a robust foundation for the model to learn from. This range is essential for the model to generalize well across different real-world scenarios. The dataset's size and variety are crucial in training a model to identify and remove noise effectively while preserving the essential details in a wide array of images. By choosing the COCO 2017 dataset, we ensure that our model is exposed to a representative sample of the real world, which is vital for its success in practical applications.
The process of data gathering from the COCO 2017 dataset required careful planning and consideration, especially regarding the size of the dataset to be used. While the original research referenced for this project utilized approximately 350,000 images, our initial dataset comprised 800 images, limited by the available GPU resources. This reduced dataset size, though smaller, was carefully curated to include a diverse set of images, thus ensuring a comprehensive learning experience for the model.
The images were selected to represent various scenes and noise levels, providing a broad range of challenges for the model to tackle. The dataset actually has around 80 categories of data. Based on this knowledge, a sample of 10 images from each of these categories is sampled to get a training dataset of 800 images.
This variety is key in training a robust model capable of handling different types of noise and image qualities. This random sampling results in 2% of the original dataset. The dataset size, though smaller than ideal, still offers a valuable opportunity to explore the capabilities of GANs in noise reduction, with the potential for scaling up as more resources become available. The selection reflects a balance between resource availability and the need for a varied and challenging dataset.
The COCO 2017 dataset can be found here.
Preprocessing the images from the COCO 2017 dataset was a critical step in setting up the data for effective training. The images from this dataset have an average resolution of 640*480 and few of them have a higher resolution of 1080*1080. Since the resolution is not uniform, the following pre-processing is done on the high-resolution images.
This process involved resizing the images to uniform dimensions, a necessary measure for consistency and efficiency in training. I have standardized the image size to 512 pixels in both width and height, which was determined to be optimal for our model’s architecture.
High-resolution image with a resolution of 640*480.
On the left side, is the high-resolution image of a resolution of 640*480 before any preprocessing.
This image is then standardized by resizing to 512*512 pixels as shown on the right-hand side.
This size ensures that the images are large enough to retain significant details for the model to learn from, yet small enough to be manageable given our computational constraints.
By standardizing the dimensions, we also ensure that each image is treated equally by the model, which is crucial for the model's ability to learn generalized noise reduction patterns.
High-resolution image after resizing to 512*12.
The consistency in image size also simplifies various stages of the modeling process, from input handling to the application of convolutional layers. This standardization is a key step in preparing the dataset for the subsequent stages of the model training process.
This is the only pre-processing step involved in creating high-resolution images.
The creation of low-resolution (LR) images from the high-resolution (HR) originals was a pivotal part of our data preparation. This step was achieved by downsampling the HR images by a factor of 2, resulting in LR images with dimensions of 256*256 pixels. This downsampling process is integral to the functioning of our GAN model, as it sets up a scenario where the model learns to enhance the resolution from LR to HR. The LR images, although reduced in quality, still retain essential structural features necessary for the GAN to learn the upscaling process. This step mirrors real-world scenarios where high-quality images are often not available, and enhancement from lower quality is required. By training the model with these pairs of LR and HR images, we enable it to learn the complex task of noise reduction and detail enhancement, crucial for practical applications. The downsampling process, while simple, plays a crucial role in defining the challenge that the model aims to solve.
Raw high-resolution image with a resolution of 640*640.
On the left side, is the high-resolution image and on the ride side is the low-resolution image.
Low-resolution image after downsampling to 256*256.
After the creation of high and low-resolution images, the dataset is split into training and test datasets. The ratio used for this split is 0.8 indicating that for training 640 images were used and 160 images for testing.
Normalizing the image data was the final, yet crucial, step in our preprocessing pipeline. Normalization involves scaling the pixel values of images to a standard range, which in our case was [-1,1] for both LR and HR images. This normalization is important as it helps stabilize and accelerate the training process by ensuring that the network inputs are within a range that is optimal for the neural network models. In contrast to the original research that used different scales for LR and HR images, our approach maintains consistency across both image types. This uniform scaling is beneficial as it simplifies the model's learning process, ensuring that it treats both LR and HR images under the same scale of reference. Furthermore, normalization aids in preventing issues related to model convergence and helps in achieving faster and more stable training. This step, while technical, is fundamental in preparing the data for efficient and effective learning by the GAN model.
The above image is the unnormalized tensor of the input image.
The left is the image before normalization and the right side is the image of the tensor after normalization.
The above image is the normalized tensor of the input image.
Code Access:
The code is available here.