Research‎ > ‎

Polydioptric Camera Design


Conventional photo and video cameras were constructed to capture a view of the world that is similar to the view we capture of the world. It has been found out that these pinhole cameras are not necessarily the optimal cameras for processing visual information using a machine. Inspired by nature's task-specific eye design, we define a framework for camera design with regard to 3D motion estimation.

1. A Framework for Camera Design

When we think about vision, we usually think of interpreting the images taken by (two) eyes, such as our own, that is images acquired by planar eyes. But these are clearly not the only eyes that exist; the biological world reveals a large variety of designs. An eye or camera is a mechanism that forms images by focusing light onto a light sensitive surface (retina, film, CCD array, etc.). Different eyes or cameras are obtained by controlling three elements:

  1. the geometry of the surface
  2. the geometric distribution and optical properties of photoreceptors, and
  3. the way light is collected and projected onto the surface (single or multiple lenses, tubes as in compound eyes)

Evolutionary considerations tell us that the design of a system's eye is related to the visual tasks the system has to solve. The way images are acquired determines how difficult it is to perform a task and since systems have to cope with limited resources, their eyes should be designed to optimize subsequent image processing as it relates to particular tasks.

We can model a generalized camera as a combination of a filter and sampling pattern in the space of light rays. The filter models the effects of the optical system and the sampling pattern that is determined by the by geometric properties of the camera. Such a model allows us to phrase the problem of camera design in terms of finding the filter and sampling pattern in light ray space that will optimally facilitate the task at hand.

A Metric for Eye Design

To evaluate and compare different eye designs in a scientific sense by using mathematical considerations we chose the recovery of descriptions of space-time models from image sequences as our problem. More specifically, we want to determine how we ought to collect images of a (dynamic) scene to best recover the scene's shapes and actions from video sequences. This problem has wide implications for a variety of applications not only in vision and recognition, but also in navigation, virtual reality, tele-immersion, and graphics. At the core of this capability is the celebrated module of structure from motion, and so our question becomes: What eye should we use, for collecting video, so that we can subsequently facilitate the structure from motion problem in the best possible way?

By examining the differential structure of the space of time varying light rays (as described in the framework of plenoptic video geometry), we relate different known and new camera models to the spatio-temporal structure of the observed scene.


The field of view of a camera determines how robustly we can estimate the 3D Motion of this camera. For example, the estimation for a small FOV camera is ill-posed which manifests itself in ambiguities that are explained and demonstrated. These ambiguities disappear when we increase the field of view.

Accurate 3D Motion Estimation is Necessary to Build Accurate 3D Models .

Accurate ego-motion estimation is essential if one wants to build accurate models of the world from video. As can be seen in the following movie (AVI, 3,2Mb)

Effect of Motion Estimation Error on Reconstruction Accuracy

that demonstrates how small changes in the localization of the feature points and camera positions and orientations can have dramatic effects on the accuracy of the reconstruction. The maximum localization error in this movie was 5 pixels for the correspondendence and two percent relative error for the camera position relative to the object distance.

Stability of 3D motion estimation depends on the field of view.

It is a well known fact that the stability of 3D motion estimation for a pinhole camera strongly depends on the size of the field of view. To demonstrate the effect, please take a look at the following movie (AVI, 5,2Mb) that illustrates the confusion of rotation and translation for a small field of view camera.

Confusion of Rotation and Translation for Small Field of View

On the left the camera is undergoing translational motion, on the right rotational motion. While the movie is playing, examine the top and side views of the cameras and try to decide only based on the image information which views are from the translating camera and which views are from the rotating camera.

We see that if we have only access to the top view then the estimation is very ambiguous. In contrast, if we have also access to a side view, the confusion between rotation and translation disappears.

Solution of motion estimation for small field of view is under constrained.

The reason for the sensitivity can easily be explained. If we examine the following two illustrations, we see that the measurements in the images, here we show image gradients, but they might as well be optical flow vectors or feature tracks, can only be made in the plane perpendicular to the image location vector (left illustration below).

Image Measurements

Measurements are only made in the image plane

Ambiguity in recovery of motion parameters

Ambiguity in recovery of motion parameters

Usually the rigid motion parameters are determined by fitting a parameterized instantaneous motion model to the observed measurements. That means we are trying to the find the motion parameters that explain the image measurements most accurately according to some error criteria. In the illustration to the right above, we see that we cannot determine the component of motion parameter vectors that are parallel to r. If we have a small field of view, that means that the vectors r span only a small part of the sphere of directions, then the motion estimation will be subject to the so called line ambiguity.

Comparison of small FOV vs Spherical FOV Cameras.

The line ambiguity can be seen in the following example where we compare the motion estimation for the individual cameras of the Argus Eye with the estimation that uses information from all the cameras simultaneously and thus is a large field of view camera.

Error Surface for Individual Cameras

Residuals over all translation directions for single cameras.

Error Surface for Argus Eye

Residual over all translation directions when combination infomation from all six cameras.

There is a noticable valley in the error surface for the individual cameras due to the line ambiguity, while the ambiguity vanishes when we use all the cameras.

More information:

For more detailed information please read the accompanying paper about the Argus Eye and the papers about camera hierarchies in the publications section.

3. A Hierarchy of Cameras

This allows us to define a hierarchy of camera designs, where the order is determined by the stability and complexity of the computations necessary to estimate structure and motion. The dioptric axis (number and spacing of view points) determines if the 3D motion estimation is scene-independent or scene-independent and the field of view axis determines the noise sensitivity of the estimation. At the low end of this hierarchy is the standard planar pinhole camera for which the structure from motion problem is scene dependent and ill-posed. At the high end is a camera , which we call the full field of view polydioptric camera, for which the problem is scene independent and stable. In between are multiple view cameras with a large field of view which we have built, as well as catadioptric panoramic sensors and other omni-directional cameras. This classification is summarized in the following two figures:

A hierarchy of camera designs.

Hierarchy for Moving Camera - Static World

Hierarchy for Moving Camera - Static World

Hierarchy for Moving Object - Static Cameras

Hierarchy for Moving Object - Static Cameras

Small Field of View Pinhole Cameras.

Small Field of View Pinhole Camera

Small Field of View Stereo Camera

Small Field of View Stereo Camera

At the bottom of the camera hierarchy is the standard smalll field of view pinhole camera. Pinhole cameras capture only rays through a single point in space. This makes it impossible to apply the plenoptic motion constraint unless the motion consists of pure rotation. We need to estimate both the camera motion and the scene structure which leads to a non-linear problem. Since the stability of the problem depends on the field of view and conventional cameras have a rather small field this estimation is also very noise sensitive.

Spherical Pinhole Camera and its Argus Eye Implementation

Spherical Pinhole Camera (Omnidirectional Camera)

Multiple Small Field of View Pinhole Cameras (Argus Eye)

Higher in the hierarchy along the field of view axis, we find the large field of view pinhole cameras. Under the term Omni-directional vision these cameras have been subject of intense studies recently. For some example research groups who study these cameras see the Page of Omnidirectional Vision. If we disregard the problems of the often non-planar signal processing, the scene structure and 3D motion estimation for these large field of view cameras becomes well-posed and stable. Nevertheless, due to the single view point the scene and motion parameters are still couples, causing the estimation to be nonlinear.

Polydioptric Cameras

Small Field of View Polydioptric Camera

Plenoptic Manifold Camera

Spherical Polydioptric Camera

Spherical Polydioptric Camera

A polydioptric camera is a generalized camera that captures a multi-perspective subset of the space of light rays. An example implementation would be a regular array of many closely spaced conventional pinhole cameras. This cameras allows us to apply the plenoptic motion constraints which decouples the estimation of camera motion and scene structure, thus greatly simplifying the estimation of either one. These stability of the 3D motion estimation still depends on teh field of view of the camera, thus suggesting that the optimal camera for 3D motion estimation would be a spherical field of view polydioptric camera.

Multi View Stereo and Polydioptric Domes

 Multi View Stereo Dome

Multi View Stereo Dome

Plenoptic Dome

Multi View Plenoptic Dome

The presented classification of cameras applies also to camera arrangements where the cameras surround the object of interest. If the object is surrounded from all sides by conventional pinhole cameras then the object shape and motion estimation is well-posed, but non-linear For an example click here. If the pinhole cameras are replaced with polydioptric cameras, then the shape and motion estimation becomes linear.

4. Polydioptric Cameras

We will use the term polydioptric camera to denote a generalized camera that captures a multi-perspective subset of the space of light rays. The name is a combination of dioptric ("assisting vision by refracting and focusing light") and poly ("in a multitude of ways"). For more info see Merriam-Webster:

A regular array of pinhole cameras forms a polydioptric camera.

A theoretical model for a camera that captures the plenoptic function in some part of the space is a surface S that has at every point a pinhole camera. We call this camera a polydioptric camera. A "plenoptic camera'' had been described by Adelson and Wang in 1992, but since no physical device can capture the true time-varying plenoptic function, we prefer the term polydioptric to emphasize the difference between the theoretical concept and the implementation. With a polydioptric camera we observe every point in the scene in view from many different viewpoints (theoretically, from every point on S) and thus we capture many rays emanating from that point. A parameterization for these general cameras has been introduced recently by Grossberg and Nayar in 2001. A polydioptric camera can be obtained if we arrange ordinary cameras very close to each other (Figs. 1 and 2). This camera has an additional property arising from the proximity of the individual cameras: it can form a very large number of orthographic images, in addition to the perspective ones. Indeed, consider a direction r in space and then consider in each individual camera the captured ray parallel to r. All these rays together, one from each camera, form an image with rays that are parallel. Furthermore, for different directions r a different orthographic image can be formed. For example, Fig. 2 shows that we can select one appropriate pixel in each camera to form an orthographic image that looks to one side (blue rays) or another (red rays). Fig. 3 shows all the captured rays, thus illustrating that each individual camera collects conventional pinhole images.

Design of Polydioptric Camera Polydioptric Camera capturing Orthographic Images Design of Polydioptric Camera capturing Perspective Images
Fig.1 : Design of a Polydioptric Camera

Fig.2: capturing Parallel Rays

Fig.3: and simultaneously capturing Pencils of Rays

Thus, a polydioptric camera has the unique property that it captures, simultaneously, a large number of perspective and affine images (projections). We will demonstrate that it also makes the structure from motion problem linear. A polydioptric spherical camera is therefore the ultimate camera since it combines the stability of full field of view motion estimation with linearity of the problem, as well as the ability to reconstruct scene models with minimal reconstruction errors since we can choose the viewpoints from the viewpoint manifold for the scene reconstruction to minimize reconstruction uncertainty.