Imaging Pipelines

Digital imaging goes far beyond counting the photons that land on each pixel. Photon counting is actually only the first of many steps that an image will take prior to appearing on a display or being ready to print. The image passes through a large number of processing phases to improve the visual quality.

Many of the stages of this pipeline are things that one might do by hand using Photoshop. One of the most well-known imaging artifacts is “red-eye,” where the flash from the camera reflects off the retina and returns to the camera as a red glow over the center of the eyes. In an image-editing program, there might be a tool that allows you to draw a rectangle around the red-eye and it would then suppress the red glow. These tools have evolved from the point where these bounding boxes drawn around the red eye had to be accurately centered on each eye, to requiring only a rough outline around the face, and now require little or no supervision at all. This level of automation is the goal of the imaging pipeline in a digital camera: the camera will search for imaging problems and correct them before the photographer ever has a chance to see it.

Red-eye reduction is only one of the many functions of a digital imaging pipeline. Noise reduction, color saturation and gamma correction are all functions that are expected from even the simplest pipelines.

Image tuning

Much of the Photoshop experience is based on adjusting different filters and tools to give you the image that you would like. Nearly every Photoshop tool has at least one and possibly half a dozen sliders that can be used to adjust the strength of the image operation. For example: consider a grainy image. Applying a tiny amount of blur to the image can remove the graininess. However, it also softens edges and removes fine details. Determining the appropriate tradeoff between detail and graininess is a subjective judgment call, and Photoshop’s preview modes help one to tune the image to their preference. Another example is color enhancement. In a portrait photo amidst a floral background, one might want to enhance the flower colors in the background. This often means saturating the background colors in the image. Unfortunately, it can be difficult at times to fully differentiate the foreground face from the background, and saturating flesh colors is to be avoided at all costs.

Shooting in raw

Some cameras are capable of “shooting in raw.” The raw image is the image captured by the sensor before it has passed through the pipeline. It is typically a large file as it has not undergone any compression. It is also very difficult to view the raw image. It is not a “color image” it contains a mosaic of red, green, and blue color samples of the image. Thus, to even show the image on the screen in a recognizable state, color needs to be interpolated (a process known as demosaicing). Why offer this image output if it is so cumbersome? It allows an advanced user, or one with access to advanced tools, to get the very most out of the image. In the process of making a final Jpeg image, a tremendous amount of data is lost. That is why a raw image might be 10 megabyte and the jpeg is 1 megabyte. Raw data allows the advanced user to perform their own manual image processing pipeline while keeping all the original data available.

Why is all this needed?

Compare an image taken with a 35mm disposable film camera (with no digital processing at all) with a low-end cell phone image today. With all of the possibilities of digital processing, why is that the disposable camera could take a better picture (under many low-light conitions)? The short answer is that film cameras never got really small. Space was needed to store the roll of film and this dictated the minimum size of the camera. With light, bigger is better. A larger lens can capture more light, and does not experience diffraction effects. For a similar resolution image, a bigger sensor captures more photons per “pixel.” The current trend in portable electronics is to minimize size and increase the number of functions. Now high-end cell phones contain two cameras (one on each side) and along with the phone, occupy a volume smaller than the most compact film cameras. Add to this, that the new cameras tend to have ever-increasing resolution; and one can infer that the pixel sizes are getting very small while expectations are ever-increasing. With little room remaining to shrink the sensor and lenses, there remains immense room to expand the signal processing.

State of the art pipelines

Now image-processing pipelines can do some advanced processing on the content of the image. The cameras will now recognize certain types of images and respond accordingly: i.e. it knows if it is looking at a portrait vs. landscape vs. backlit scene. It can detect and in some cases recognize faces. It can make use of a series of rapidly taken images to enhance resolution, search for eye blinks, remove blur and more. Much of this new field is known as computational photography and is quite exciting.