Dictionary Learning

After progress stalled on our other techniques to remove the artifact, we went to office hours and the idea of dictionary learning was suggested to us. At a high level, this process involves taking a training data set without the artifact present, extracting patches from it, and using those patches in an optimization algorithm. It alternates between optimizing a sparse matrix and updating a dictionary. The algorithm's output is a "trained" dictionary, which is then used to process a test image and remove the artifact from it.

Motivation

The use of dictionaries to represent images in the natural world for applications like image compression, inpainting, and de-noising is quite common. There are a multitude of existing dictionaries that have been designed to represent natural images on earth in their atoms. However, with our application of astronomical photos, these existing dictionaries would not be helpful [24]. We were motivated to find an iterative, mathematical way to create our own dictionary which would represent patches of astronomical images in its atoms.

The other methods we attempted during this project use classical signal processing techniques we learned about in lecture and we also wanted to explore other, more complex approaches that weren't covered. We determined that our previous methods often created errors in parts of the image other than the artifact spots, in the form of blurring, distortion, or excessive smoothing. Our hope with dictionary learning was that it would be able to train on images that didn't have the JWST artifact, and in turn, process the image by removing the artifact.

Algorithm Flowchart

Below is a high-level visualization of the process we developed. The inputs to the code are two images, and the parameters that can be adjusted are the patch dimension and a verbose output for debugging; one of the two images is the training image, it does not contain the artifact, the other is the testing image, which does contain the artifact. The other main parameters are calculated from the patch size or were determined through trial and error of adjusting the program.

The Optimization Problem

Inside the dictionary learning block of the flowchart, there is block labeled as FISTA. FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) is how we adjust the sparse coefficient matrix before updating the dictionary accordingly. This is where the majority of the math involved in this project came from.

To train our dictionary, we want to solve the following problem:

min (D, Z) norm(X – D・Z) + λ || Z ||1, 1

Because of the complex nature of this optimization problem, it is more feasible to test a starting point, take a step to a different point in the function, and see if it was an improvement, repeating the process to get closer and closer to the optimal result. This is an iterative solving method, which reduces the complexity of some pieces of an otherwise very difficult problem. However, it presents its own set of challenges, especially as there are many parameters to consider and adjust for a problem like this.

Our version of FISTA, with a generalized starting point given to us by ChatGPT, runs over iter_SC iterations with the following process:

Calculate gradient:

grad = D' * (D * Y) - DtX;

Perform the soft-thresholding operation (reducing noise, suppressing values less than threshold):

Z = soft_thresholding(Y - grad / L, lambda / L);

Update the step size and a matrix Y used to calculate the gradient next iteration:

t_new = (1 + sqrt(1 + 4 * t^2)) / 2;

Y = Z + ((t - 1) / t_new) * (Z - Z_old);

t = t_new;

Finally, check for convergence with the following condition:

if norm(Z - Z_old, 'fro') / norm(Z_old, 'fro') < tol

The next piece of optimizing the dictionary is updating the dictionary itself, with the newly recalculated sparse matrix. This is part of the outer blue box in the flowchart above, and it alternates with the fista() function over iter_DL iterations. The pinv() function calculates the solutions to a system of equations, and is useful here because the system may not have a unique solution and it may have many solutions. The second line normalizes the dictionary matrix D over its columns:

D = X*pinv(Z);

D = D ./ sqrt(sum(D.^2, 1));

Results

Notes:

Success was determined qualitatively, focused on removal of the artifact from the image and the quality of the rest of the image.
The plots below are of the objective function and data fit. As mentioned above, the optimization problem is to minimize the following equation:

min (D, Z) ||X – D・Z|| + λ || Z ||1, 1

- Objective function: This is the entire equation to be minimized, ||X – D・Z|| + λ || Z ||1, 1. The goal of our algorithm is to find the D and Z which minimize the objective function
- Data fit: This is the first term, ||X – D・Z||. We plotted this part of the equation separately to see what its trend would be without the effect of the second term. If we could get the data fit term to zero, then even if the lambda term was nonzero, we could know we were optimizing well.
If the objective function and data fit are going down smoothly across iterations, it is a good sign for success of the algorithm. Spiky patterns indicate difficulty optimizing the dictionary and/or the sparse matrix. This can be seen with the very first set of results as well as the "altered" Gabor results.
The initial dictionary was a primary point of exploration: It is a concatenated n x 2n matrix of two square matrices. The left matrix was always kept as a square DCT matrix and the right matrix was varied between a randomly generated sparse matrix, a Haar matrix, and a matrix of Gabor wavelets. Our results will include processing using all three of these dictionaries.

Random Initial Dictionary

The first set of results is with a .jpg image and the second is with a .png image. We discuss this further in the Analysis section.

Haar Initial Dictionary

All images for Haar and Gabor dictionaries used the same .png image for the training data.

Gabor Initial Dictionary

All three runs used the same training image. The first set of results is with the standard set of parameters. The second set had double the sigma parameter, which is the standard deviation of the Gaussian envelope for the Gabor wavelets. The third had double the gamma parameter, which creates longer, thinner wavelet lines in the matrix. Neither of these adjustments made anything better, and even degraded the quality of the output significantly. They were included in this results section to indicate how Gabor matrix choice can affect results and also show that although our results are lackluster, it can be worse.

Analysis

Nothing about this approach as we have developed it so far was much better at reducing the effects of the JWST artifact compared to the other methods, despite all the time that we invested into trying to make this work well. In theory, dictionary learning is a powerful tool that can adjust images in a variety of ways. Ultimately, in the scope of this project, we were limited by time and specific knowledge about dictionary learning. We will discuss what we observed, what went wrong, why it turned out how it did, and what could be done better with more work in the following section.

Patterns

PNG training data appears to be far better than JPEG format. After running each type of dictionary with what we called training_image1.jpg, there was a significant amount of washing out and loss of detail from that image specifically. With the other two training images we tried, despite having very different astronomical features, the convergence plots looked smoother and the results looked clearer. It may be the difference between lossy vs. lossless image formats, but the cropping we do to 512 x 512 pixels should account for that difference. There may be artifacts or blocking that already existed in the jpeg image that weren't fixed by the crop. It could also be none of these reasons, and something else we don't understand.
Haar and Gabor initial dictionaries with "default" parameters yielded the best results. We defined this as the least excess noise or unwanted patterns. We saw a slight increase in the bright areas around stars, indicating that the dictionary may be boosting the stars' brightness. We didn't see any evidence of meaningful reduction in the artifact pattern, however.
lambda = 0.05 seemed to give the smoothest and lowest convergence plots. The plots above don't show this, but from our testing, if lambda was much higher or lower, we saw an increase in the roughness of our objective function plot and more washing out in the processed images.

Problems

Even the "better" processing results don't do much to improve the artifact and still introduce noise into the image. This tells us that our efforts to use dictionary learning as a superior processing tool to remove the artifact were generally unsuccessful.
When testing convergence, we weren't able to get the objective function anywhere near the threshold we had set to stop fista() iterations. This could be an issue with step size. With more time, we would explore this part of the algorithm in more depth.
Dictionary learning, from what we understand, is more likely than the other methods we tried to generate new artifacts that weren't present in the original image, as we can see with the results from the adjusted Gabor wavelets. This changes the scientific integrity of our output, so we have to be very careful that any final method errs on the side of caution. Under-processing and only slightly reducing the effect of the artifact is preferred to over-processing and adding in noise, smoothing, or other unwanted distortion.

Future Solutions

A logical next step would be to target the parts of the image that had the artifact specifically, perhaps combining edge detection and dictionary learning
Adjusting step size or creating a formula which adjusts step size over time, an array of step sizes with a function that decreases step size over time. This may allow us to iterate faster in the beginning of the algorithm when we are far from the solution to the minimization problem and get more accurate results as we get closer. This idea of adjusting step size is similar to how a binary search cuts through large chunks of the search area before narrowing in.
Changing where the patches are selected from on the image, maybe with increased overlap and a larger number of patches, could fix some of the grid effect we see, shown in the adjusted Gabor results.

Note: ChatGPT & UMGPT were used sparingly to generate initial code for fista() and dictionary_learning() functions for this section.

Page updated

Report abuse