Summary:Plenoptic Video Geometry
is the study of the space of light rays as observed by a moving imaging
sensor. The space of light rays is the most complete representation of
visual information possible. All the information about the world that
can be captured using visual sensors is encoded in the intensity
function defined on the space of rays in 3D. This function is called
the plenoptic function. Thus, if we understand the intrinsic structure
of the plenoptic function, we can design optimal image acquisition
devices and image processing algorithms to recover the information
about the world that we seek. 1. DefinitionsThe plenoptic function.At each location x in free space, the radiance, that is the light intensity or color observed at x from a given direction r at time t, can be measured by the plenoptic function: where d=1 for intensity, d=3 for color images, and is the unit sphere of directions in (Adelson and Bergen 91). Therefore, the plenoptic function in free space reduces to five dimensions -- the time-varying space of directed lines for which many representations have been presented (for an overview see Camahort and Fussel). 2. Plenoptic Brightness ConstancyIf the world and illumination are static, the space of light rays is invariant over time. Therefore, we can estimate the 3D motion of a rigidly moving image sensor by matching the sets of light rays captured at different time instants. Since we match the sets of rays to itself, this estimation is independent of the scene and the surface reflection properties of the scene objects.Discrete Plenoptic Motion Constraint.Let us assume that the albedo of the scene surfaces is constant over time and that we observe a static world under constant illumination. In this case, the radiance of a light ray in the world does not change over time which implies that the total time derivative of this light ray vanishes: d/dt L(x;r,t) = 0. This means now that if we transform the space of light rays by a rigid transformation, for example parameterized by the rotation matrix R and a translation vector t, then we have the exact identity, which we term the discrete plenoptic motion constraint ,
since the rigid motion maps the time-invariant space of light rays upon itself. Thus, the problem of estimating the rigid motion of a sensor has become an image registration problem that is independent of the scene! Illustration of plenoptic brightness constancy using subsets of an epipolar volumeWe can illustrate the basic idea by examing how the image motion flow depends on the scene if we look at the subsets of an epipolar volume that are either corresponding to an image sequence captured by a conventional perspective camera or an image sequence captured by a linear pushbroom camera. We can form an epipolar volume by translating a camera parallel to the horizontal image axis and stack the frames of the image sequence to form a volume:
Every pixel in an epipolar volume corresponds to a unique ray in space. If a camera is undergoing a rigid motion constrained to a horizontal plane, then we can illustrate the subset of light rays that a camera will capture during its motion by sweeping a plane through the epipolar volume. The top half of each movie shows the image sequence and the bottom half the sweep through an epipolar image. By sweeping through the epipolar volume we can simulate the following four rigidly moving cameras:
We can see (top half) that for a rotating push-broom camera and for a translating perspective camera the image motion depends on the depth of the scene. This is the well-known effect of motion parallax. We also notice (bottom half) that during each frame the cameras capture different light rays. Thus to to estimate the camera motion on the basis of the image sequences, we need to estimate the scene structure so that we can correspond the pixels (light rays) to eachother.
In contrast, we see (top half) that for a translating push-broom sequence and a rotating perspective image sequence the optical flow in the images is independent of the scene structure. For a perspective camera this is well-known and has been used to generate panoramic images and the parameterization between the frames is given as a homography. This is because most of the rays that form an image of the image sequence at any given time are also part of the preceding and following frames. Only the image boundaries contain new information. Thus we are able to estimate the rotation (translation) by globally matching images to images without having to compute any scene parameters! The idea of polydioptric motion estimation is now that by matching light rays across view points and view directions we can estimate the full 3D motion of a polydioptric camera similar how we can estimate motion of a pinhole camera that is rotating around its optical center. 3. Differential Plenoptic Motion EstimationIf the plenoptic function is smooth in a local neighbourhood, we can define a plenoptic brightness constancy constraint that relates the differential changes in position and orientation of ray over time to the derivatives of the plenoptic function. This leads to a differential plenoptic motion constraint that enables us to find the six rigid motion parameters by solving a highly over-determined linear system of equations.Differential Plenoptic Brightness Constancy.Assuming that the plenoptic function in the neighbourhood of the ray parameterized by the origin x and direction r is smoothly varying, then we can develop the plenoptic function L in the neighbourhood of (x;r,t) into a Taylor series .
Disregarding the higher-order terms, we have a linear function which relates a local change in view ray position and direction to the differential brightness structure of the plenoptic function. This allows us to use the spatio-temporal brightness derivatives of the light rays captured by an imaging surface to constrain the plenoptic ray flow, that is the change in position and orientation between rays captured by the same imaging element at consecutive time instants, by generalizing the well-known Image Brightness Constancy Constraint to the Plenoptic Brightness Constancy Constraint: .
Differential Plenoptic Motion Constraint.Assuming that the imaging sensor undergoes a rigid motion with instantaneous translation t and rotation around the origin of the fiducial coordinate system, we can define the plenoptic ray flow for the ray captured by the imaging element located at location x and looking in direction r as Combining the last two equations leads to the differential plenoptic motion constraint
which is a linear constraint in the motion parameters and relates them
to all the differential image information that a sensor can capture. To
our knowledge, this is the first time that the temporal properties of
the plenoptic function have been related to the structure from motion
problem. In previous work, the plenoptic function has mostly been
studied in the context of image-based rendering in computer graphics
under the names light field (Levoy and Hanrahan 96) and lumigraph (Gortler etal. 96),
and only the 4D subspace of the static plenoptic function corresponding
to the light rays in free space was examined. The advantages of
multiple centers of projection with regard to the stereo estimation
problem had been studied before, for example in (Shum etal. 99). Plenoptic motion estimation using polydioptric cameras.It is also important to realize that the derivatives and can be obtained from the image information captured by a polydioptric camera. Recall that a polydioptric camera can be envisioned as a surface where every point corresponds to a pinhole camera, the plenoptic derivative with respect to direction is the derivative with respect to the image coordinates that one finds in a traditional pinhole camera. One keeps the position and time constant and changes direction. The second plenoptic derivative, , is obtained by keeping the direction of the ray constant and changing the position along the surface. Thus, one captures the change of intensity between parallel rays. This is similar to computing the derivatives in an affine or orthographic camera. The ability to compute all the plenoptic derivatives depends on the ability to capture light at multiple viewpoints coming from multiple directions. This corresponds to the ability to incorporate stereo information into motion estimation, since multiple rays observe the same part of the world. For single-viewpoint cameras this is inherently impossible, and thus it necessitates nonlinear estimation over both structure and motion to compensate for this lack of multi-view (or equivalently depth) information. |
Research >