Research‎ > ‎

Plenoptic Video Geometry


Plenoptic Video Geometry is the study of the space of light rays as observed by a moving imaging sensor. The space of light rays is the most complete representation of visual information possible. All the information about the world that can be captured using visual sensors is encoded in the intensity function defined on the space of rays in 3D. This function is called the plenoptic function. Thus, if we understand the intrinsic structure of the plenoptic function, we can design optimal image acquisition devices and image processing algorithms to recover the information about the world that we seek.

1. Definitions

The plenoptic function.

At each location x in free space, the radiance, that is the light intensity or color observed at x from a given direction r at time t, can be measured by the plenoptic function:

where d=1 for intensity, d=3 for color images, and is the unit sphere of directions in (Adelson and Bergen 91).
Since the image irradiance that is recorded by an imaging device is proportional to the scene radiance, we assume that the intensity recorded by an imaging sensor at position x and time t pointing in direction r is equal to the plenoptic function L(x;r,t). A transparent medium such as air does not change the color of the light, therefore, we have a constant radiance along the view direction r:

Therefore, the plenoptic function in free space reduces to five dimensions -- the time-varying space of directed lines for which many representations have been presented (for an overview see Camahort and Fussel).

2. Plenoptic Brightness Constancy

If the world and illumination are static, the space of light rays is invariant over time. Therefore, we can estimate the 3D motion of a rigidly moving image sensor by matching the sets of light rays captured at different time instants. Since we match the sets of rays to itself, this estimation is independent of the scene and the surface reflection properties of the scene objects.

Discrete Plenoptic Motion Constraint.

Let us assume that the albedo of the scene surfaces is constant over time and that we observe a static world under constant illumination. In this case, the radiance of a light ray in the world does not change over time which implies that the total time derivative of this light ray vanishes: d/dt L(x;r,t) = 0. This means now that if we transform the space of light rays by a rigid transformation, for example parameterized by the rotation matrix R and a translation vector t, then we have the exact identity, which we term the discrete plenoptic motion constraint


since the rigid motion maps the time-invariant space of light rays upon itself. Thus, the problem of estimating the rigid motion of a sensor has become an image registration problem that is independent of the scene!

Illustration of plenoptic brightness constancy using subsets of an epipolar volume

We can illustrate the basic idea by examing how the image motion flow depends on the scene if we look at the subsets of an epipolar volume that are either corresponding to an image sequence captured by a conventional perspective camera or an image sequence captured by a linear pushbroom camera.

We can form an epipolar volume by translating a camera parallel to the horizontal image axis and stack the frames of the image sequence to form a volume:

Image Sequence by a Translating Camera

Image Sequence by a Translating Camera

Epipolar Volume

Epipolar Volume

Every pixel in an epipolar volume corresponds to a unique ray in space. If a camera is undergoing a rigid motion constrained to a horizontal plane, then we can illustrate the subset of light rays that a camera will capture during its motion by sweeping a plane through the epipolar volume. The top half of each movie shows the image sequence and the bottom half the sweep through an epipolar image. By sweeping through the epipolar volume we can simulate the following four rigidly moving cameras:


Translating Perspective Camera


Rotating Push-Broom Camera

 Image Sequence by a Translating Perspective Camera (1.8 mb avi) Image Sequence by a Rotating Linear Push-Broom Camera (3.8mb avi)

We can see (top half) that for a rotating push-broom camera and for a translating perspective camera the image motion depends on the depth of the scene. This is the well-known effect of motion parallax. We also notice (bottom half) that during each frame the cameras capture different light rays. Thus to to estimate the camera motion on the basis of the image sequences, we need to estimate the scene structure so that we can correspond the pixels (light rays) to eachother.

In contrast, we see (top half) that for a translating push-broom sequence and a rotating perspective image sequence the optical flow in the images is independent of the scene structure. For a perspective camera this is well-known and has been used to generate panoramic images and the parameterization between the frames is given as a homography. This is because most of the rays that form an image of the image sequence at any given time are also part of the preceding and following frames. Only the image boundaries contain new information. Thus we are able to estimate the rotation (translation) by globally matching images to images without having to compute any scene parameters! The idea of polydioptric motion estimation is now that by matching light rays across view points and view directions we can estimate the full 3D motion of a polydioptric camera similar how we can estimate motion of a pinhole camera that is rotating around its optical center.

3. Differential Plenoptic Motion Estimation

If the plenoptic function is smooth in a local neighbourhood, we can define a plenoptic brightness constancy constraint that relates the differential changes in position and orientation of ray over time to the derivatives of the plenoptic function. This leads to a differential plenoptic motion constraint that enables us to find the six rigid motion parameters by solving a highly over-determined linear system of equations.

Differential Plenoptic Brightness Constancy.

Assuming that the plenoptic function in the neighbourhood of the ray parameterized by the origin x and direction r is smoothly varying, then we can develop the plenoptic function L in the neighbourhood of (x;r,t) into a Taylor series


Disregarding the higher-order terms, we have a linear function which relates a local change in view ray position and direction to the differential brightness structure of the plenoptic function. This allows us to use the spatio-temporal brightness derivatives of the light rays captured by an imaging surface to constrain the plenoptic ray flow, that is the change in position and orientation between rays captured by the same imaging element at consecutive time instants, by generalizing the well-known Image Brightness Constancy Constraint to the Plenoptic Brightness Constancy Constraint:


Differential Plenoptic Motion Constraint.

Assuming that the imaging sensor undergoes a rigid motion with instantaneous translation t and rotation around the origin of the fiducial coordinate system, we can define the plenoptic ray flow for the ray captured by the imaging element located at location x and looking in direction r as

Combining the last two equations leads to the differential plenoptic motion constraint

which is a linear constraint in the motion parameters and relates them to all the differential image information that a sensor can capture. To our knowledge, this is the first time that the temporal properties of the plenoptic function have been related to the structure from motion problem. In previous work, the plenoptic function has mostly been studied in the context of image-based rendering in computer graphics under the names light field (Levoy and Hanrahan 96) and lumigraph (Gortler etal. 96), and only the 4D subspace of the static plenoptic function corresponding to the light rays in free space was examined. The advantages of multiple centers of projection with regard to the stereo estimation problem had been studied before, for example in (Shum etal. 99).
It is to note, that this formalism can also be applied if we observe a rigidly moving object with a set of static cameras. In this case, we attach the world coordinate system to the moving object and we can relate the relative motion of the image sensors with respect to the object to the spatio-temporal derivatives of the light rays that leave the object.

Plenoptic motion estimation using polydioptric cameras.

It is also important to realize that the derivatives and can be obtained from the image information captured by a polydioptric camera. Recall that a polydioptric camera can be envisioned as a surface where every point corresponds to a pinhole camera, the plenoptic derivative with respect to direction Partial of Lightfield with respect to direction is the derivative with respect to the image coordinates that one finds in a traditional pinhole camera. One keeps the position and time constant and changes direction. The second plenoptic derivative, Partial of Lightfield with respect to position, is obtained by keeping the direction of the ray constant and changing the position along the surface. Thus, one captures the change of intensity between parallel rays. This is similar to computing the derivatives in an affine or orthographic camera. The ability to compute all the plenoptic derivatives depends on the ability to capture light at multiple viewpoints coming from multiple directions. This corresponds to the ability to incorporate stereo information into motion estimation, since multiple rays observe the same part of the world. For single-viewpoint cameras this is inherently impossible, and thus it necessitates nonlinear estimation over both structure and motion to compensate for this lack of multi-view (or equivalently depth) information.