Figures accompanying the text illustrate the dramatic transformations:
Before and After: Compare the low-resolution input with our super-resolved output, noting the remarkable preservation of edges and details.
The Benchmark: As appeared below, the leftmost image is the low resolution which contains lots of noise. The rightmost image is a high-resolution image which is the target image. The middle image indicates the super-resolution image by giving a low-resolution image as the input. From this image, we can see the edges properly when compared to the low-resolution image. But still, it can be made better by increasing the number of images and epochs. The underlying simulation is of size 128 × 128, while the generated output has a resolution of 512 × 512. Even though the training data did not contain any obstacles, our network managed to create a realistic high-resolution image, highlighting the model's capability to enhance and clarify.
Click on the image for better viewing.
Image from left to right: low-resolution image, deep residual generative adversarial network optimized for a loss more sensitive to human perception and original HR image
In many machine learning projects, a confusion matrix serves as a fundamental tool to evaluate the predictive accuracy of a model, distinguishing between classes with precision. However, our endeavor with SRGAN takes us into the realm of image generation, a creative process where the goal is not prediction but the recreation of high-resolution images from their low-resolution counterparts. As such, the traditional confusion matrix is not the metric of choice for our purposes. Instead, we turn to alternative indicators of performance that are better suited to the task at hand: Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). These metrics allow us to quantitatively assess the quality and fidelity of the images generated by our model, providing a clear measure of how our super-resolution images compare to the original high-definition targets.
PSNR: PSNR stands for peak signal-to-noise ratio, which is calculated in decibels, between two images. Calculation of the PSNR gives the texture difference between the actual and super-resolution images. In simpler terms, it's a measure of image clarity. The higher the PSNR, the higher and better the quality of the tested image.
SSIM: Similarity between the different images is given by the measure of structural similarity index, i.e., SSIM. This measurement is considered the quality measure of one image when it is compared with the ideal image. This measurement provides values ranging from −1 to 1 where 1 indicates perfect similarity and the latter otherwise.
The table indicates that most of the images have more than 60–70% SSIM. With adequate training, this can be increased to 90%. Values obtained in the table are comparable with the related works.
Our model has proven capable of generating high-resolution images that retain the intricate details necessary for a realistic look.
The PSNR box plot reveals that most of our super-resolved images have a PSNR value above 27 dB, which is indicative of a high-quality reconstruction. The spread of the data points within the box plot signifies that our model performs reliably, delivering a high degree of clarity in the super-resolved images.
The box plot for SSIM illustrates how similar our super-resolved images are to the original high-resolution images. SSIM values range from 0 to 1, where 1 indicates a perfect match. Our box plot shows that the majority of values are concentrated above 0.6, with the central box representing the middle 50% of our data. This high level of similarity across a range of images indicates that our model consistently maintains the structure and texture of the originals, ensuring that the essence of each image is preserved.
Both box plots are a testament to our model's effectiveness in enhancing image resolution while maintaining the integrity of the original images. These visual tools not only summarize the performance but also reassure us of the consistent quality that our SRGAN model achieves.
Above table is from a research paper: Puri, J. S. and Kotze, A.: Evaluation of SRGAN Algorithm for Superresolution of Satellite Imagery on Different Sensors, AGILE GIScience Ser., 3, 57, https://doi.org/10.5194/agile-giss-3-57-2022, 2022.
As we delve into the outcomes of our SRGAN model, we find it enlightening to benchmark our results against those found in contemporary research. The table from a prominent study showcases the performance of an SRGAN model used to enhance images from the SPOT 7 satellite, capturing fine details from 2.4 meters to Pleiadis 0.6 meters resolution. This study set a high bar for image quality, with PSNR values reaching up to 37.45 and SSIM scores peaking at 0.8931, achieved over iterations numbering up to 100K, with the best results often occurring at the 40K mark. The research can be found here.
In the context of our project, we've trained our SRGAN model for only 100 epochs, a modest number in comparison. Yet, our results resonate with the promise of SRGAN technology. Our model has achieved PSNR scores ranging around the mid to high 20s and SSIM scores that show substantial structural similarity, with scores reaching up to 0.87. Although these figures are nascent compared to the referenced study, they indicate a positive trajectory toward achieving high-fidelity image super-resolution, even with a significantly smaller number of training iterations.
What's truly remarkable is not just the numbers themselves but what they represent—a testament to the potential encapsulated within SRGANs, even at an early stage in the training process. Our results are a harbinger of the heights that could be reached with continued training and refinement. They validate that even with limited epochs, our model is on the right path, capturing the essence of super-resolution as it begins to close the gap with more extensively trained counterparts.
Beyond the impressive metrics, the results tell a story of a model that learned to interpret and recreate the essence of an image. From noisy, low-resolution inputs to clear, detailed outputs, the SRGAN has charted a path of progress in image processing.
Generator loss using binary cross-entropy
A key aspect of our SRGAN's development is visualized in the training loss graph, which captures the Generator's performance over time. This graph charts the Generator loss, calculated using binary cross entropy, against the number of training epochs. As we can see, the loss starts relatively high, indicating the initial disparity between the generated images and the target high-resolution images. However, as the epochs progress, a clear downward trend emerges, reflecting the Generator's learning and improvement. The steep decline and subsequent leveling off suggest that the Generator quickly learns the essential features necessary for image super-resolution before refining its understanding to a more nuanced level. This visual representation not only demonstrates the effectiveness of our training process but also reaffirms the capacity of our Generator to evolve toward producing high-fidelity images with remarkable clarity.
Figure indicating the input image
Feature maps are like snapshots of our SRGAN model's thought process. As it learns to enhance images, each map captures a different aspect of the image data it's processing. In one of our model's layers, we've captured a series of feature maps that offer a colorful and intricate view of the transformation of images from blurry to sharp.
Feature map of layer 1, given the input image
These feature maps represent the various features the model has detected in the image. Each map is a unique response to the patterns, textures, and contours present in the input data. Some may highlight edges and shapes, while others might capture more abstract qualities like texture and contrast. Together, they form a complex tapestry of visual cues that the model uses to reconstruct the image in higher resolution.
To the untrained eye, these feature maps are reminiscent of abstract art—vivid, vibrant, and teeming with variety. They're not just technical tools; they're a testament to the model's ability to dissect and reassemble visual information in a way that mimics the intricacies of human vision.
By showcasing these feature maps, we're not just presenting a behind-the-scenes look at our model's inner workings. We're also celebrating the depth of learning and analysis that goes into creating a super-resolution image. Each map is a piece of a larger puzzle that the model solves as it learns to improve image clarity and detail.
While these feature maps may not be 'results' in the traditional sense, they're crucial in illustrating the complexity and capability of our model. They provide a compelling narrative about the model's layer-by-layer approach to understanding and enhancing images, which is an essential part of the SRGAN's success story.
Code Access:
The code is available here.