CAIR - Content Aware Image Resizer
NOTE: Most of this needs a update. For latest news and releases please go here: http://sourceforge.net/projects/c-a-i-r/
For a good (slightly outdated) GUI showing off CAIR, go here: http://code.google.com/p/seam-carving-gui/
What is CAIR? It's a little project I started based on the works of Shai Avidan and Ariel Shair. Their paper, described here, shows a rather easy way one can do non-linear image resizing. This allows the image to be resized while changing its aspect ratio and keeping important features untouched. Essentially, it removes (or adds) parts to an image that would be least noticed. The doctors' presentation can be seen in this nice YouTube video.
Lets take this image from the Space Station (courtesy NASA):
For edge detection, each pixel is converted to grayscale. CAIR has several options for the convolution kernel, including Prewitt (seen below), Sobel, Laplacian, V1, and V^2. Each offers different advantages and generates different artifacts with heavy resizing. The V-variety can generate some of the best quality. Certainly, other types of functions could be used besides edge detection. However, edge detection is relatively fast and fairly accurate in determining objects of interest.
The next part is to determine the energy cost of a getting to a pixel from the top row. This is described in the paper, but essentially we take the value of the edge detection of the pixel plus the minimum energy of the three immediate pixels above it. For example:
energy_map[x][y] = minimum( energy_map[x-1][y-1],
energy_map[x+1][y-1] ) + edge[x][y]
When you scale the values down to grayscale, you get an image something like this:
From here, you can then find the least energy path by starting at the pixel with the smallest energy in the bottom row and working your way up. Like before, you look for the pixel with the least energy of the three above you, and you go in that direction. By making an energy image for both the vertical and horizontal, I found the two paths with least energy. They are the red lines on the below image. I placed them into the edge image for clarity.
The next step is to either remove those paths, or to add next to those paths. Adding to the image is easier, as the added pixel values will just be the averages of the pixels next to them. Removing a path can be just blending the nearby pixels. However, this can lead to some artifacts if the image is heavily resized. The doctors' paper talk about some filtering in the gradient domain of the image, which is what I plan to do someday. The order of removal is described in the paper and makes use of a transport map. This method seems computationally heavy, since all order combinations need to be explored before an optimal order can be determined. The transport map can be simulated with less complexity if we calculate both vertical and horizontal paths and then remove the path that has the least energy of the two. This is implemented in the CAIR_HD() function.
For now, I implemented an averaging blender for downsizing the image. A target resolution of 800x600 was giving to CAIR. The image it generated is on the left. The image on the right was scaled with bicubic interpolation to 800x600.
This example surprisingly shows the major limitations of the algorithm (and my implementation). Notice the cables running from the astronauts to the bottom right. The resizing has greatly distorted them. This is a limitation of the algorithm. Weighting the cable so it is not touched would not cause this glitch. Also, notice the cloud above the astronaut on the left. If you look closely there is a faint line between the lighter and darker sides of the cloud. This is a limitation of the averaging technique, and would be corrected with gradient domain removal.
As a crude example, lets try applying some weights to the image. If we add a large negative weight to a particular area, we can cause the paths to travel through that area first. Lets say the astronaut on the left ran over my dog, and as such I do not want him in the picture. By biasing the area around him with a -10,000 weight, we force the paths to go through this area first during a resize. Likewise, applying a 10,000 weight to another region will keep it unchanged in the final image. The areas outlined in red show the removed portions, and green the kept portions.
The latest version of the code can load a separate bitmap that has the weights encoded in it. Obliviously, a better solution would be a GUI. My code allows for variable amounts of weight to be applied to a pixel besides just add/remove. Also, the weight matrix is maintained along with the image.
Now, resizing the image back down to 800x600 gives us this result:
It's interesting to see his hand still exists gripping onto the rail. When removing an area, its important to NOT do the averaging/blending technique. If this is done, then the astronaut will be blending back into the image, leaving a small sliver of him after the resize.
The paper appears to gloss over the procedure of adding paths back to the image. Taken literally, the algorithm would have you choose in order the least energy seams through the source image. See, for example, this example image where all the seams from the bottom row are shown in red:
It appears that many paths merge back together as they go up, which seems logical considering the energy of the image. The paper describes adding paths next to theses paths in order from least energy to most. Since so many merge when the flow upwards, this produces horrible stretching. It appears something is missing from the paper's description.
I managed to get it working, thanks to a few tips from Ramin Sabet, creator of Liquid Resize.
Here's an example, where I used the weights to remove the astronaut, then add paths back into the image to get it back to its original size.
The primary difference between the paper and CAIR's method is the addition of artificial weight to the least energy path and the new path that was added. Then, the energy map has to be recalculated to redetermine the least energy path. This "add weight" is a variable that can be changed to determine how CAIR will find the next least-energy path. A large weight may cause the paths to move through an area marked for protection, while a small weight will cause CAIR to pick the same area, causing stretching. In CAIR, I keep the artificial weights separate from the user-supplied weights to avoid polluting the user's weights. After each new seam, the artifical weight matrix and the user weight matrix have to be added back together for form a third matrix that is actually used in the energy calculations.
Throughout this project it has become apparent that the most performance limiting factor is how the data is handled. Originally I used the EasyBMP objects to hold my images. They work fine for that, but EasyBMP was never intended for such performance goals. So, I created my own class, called the CAIR Matrix Library. The class is templated, so all of my data types (edge, weights, grayscale, and color images) use the same methods. This has worked out surprisingly well for me, considering I'm far more at home bit shifting than objects oriented generics.
There are several ways to extract further performance from the standard algorithm. For starters, everything doesn't have to be recalculated after every path removal. Grayscale values can be recalcuated only for the changed pixels. With extreme care, the edge values can be recalculated for only the changed regions. Also, with lots of thinking, portions of the energy map can be saved.
Also, large parts of the algorithm can be multi-threaded. Grayscale, edge detection, and path adding/removing can all very easily be multi-threaded into several threads. The energy map is quite a different beast since one thread relies on data provided by the other thread, so that code is currently limited to just two threads.
To resize the original image of 1024x678 down to 800x600 with CAIR() takes about 1.6 seconds with this system. For CAIR_HD(), the same resize takes about 8.8 seconds.
CAIR now implements several different convolution kernels. Below is a comparison between the Prewitt and V1 kernel. Also available are Sobel, Laplacian, and V_SQUARED.
Later work by the doctors developed a new energy algorithm, called forward energy. Read about it here. The traditional energy algorithm does not account for situations where a seam removal will cause two edges to be placed next to each other, which would increase the energy of the image. This is often seen as some sort of jaggies that develop during the resize. The forward energy algorithm does a look-ahead of edge values when the energy map is being computed.
CAIR is written in C++ and uses the EasyBMP image library for testing purposes, and the pthreads-win32 library for the multi threaded version. CAIR should be mulit-platform, requiring only pthreads-win32 to be replaced with the pthread implementation on your platform (my threading is simple, so there should be no problems). The testing application is limited to only bitmaps. Run the executable with no options to get a list of possible paramaters. A simple example is this:
cair -I test.bmp -X 800 -Y 600 -C 1 -R 6 -E 1 -O resized.bmp
That command will use CAIR_HD() with the V1 kernel and forward energy to resize test.bmp down to 800x600 and store the result in resized.bmp.
Here is the single threaded version (depreciated): CAIR.zip
Latest Multi-threaded versions: http://sourceforge.net/projects/c-a-i-r/
The single-threaded version will no longer be maintained unless there is high demand.
CAIR is released under the LGPLv2.1. EasyBMP uses a modified BSD license. pthreads-win32 falls under the LGPL license. Included is a command-line version compiled with Visual Studio 2008 and should work under any Windows machine. The multi threaded version requires pthreadVSE2.dll to be in the same directory as the executable.
Please email me at firstname.lastname@example.org if you have any questions.
Copyright ME, FOREVER!