Christopher J. Hall
Lee Yerkes
Figure 1. Early chronophotography images (5)
Abstract
Videos provide a series of related images that can be manipulated to synthesize new interesting images which contain more information than any single frame in the video. Our work explores ways that common videos can be used to synthesize new images such as panoramas, activity synopsis syntheses (also sometimes called chronophotography which is related to stroboscopy), and images with occlusion removed.
Introduction
Videos provide a series of related images that can be manipulated to synthesize new interesting images which contain more information than any single frame in the video. Our work explores ways that common videos can be used to synthesize new images such as panoramas, activity synopsis syntheses (also sometimes called chronophotography which is related to stroboscopy), and images with occlusion removed. This work has largely been inspired by chronophotography and images created with the stroboscope. Our work builds directly on the ideas of Gleicher and Liu’s work to create panoramas from video, select optimal frames by using frame quality measures, and using optical flow to remove dynamic objects which can later be added in for the creation of activity synopsis syntheses (1). New twists inspired by chronophotography and stroboscopic images are given to these chronopanoramas such as variable blending and coloring of objects. Also a new idea for removing static objects by moving the camera relative to the scene is explored.
Literature Review
Chronophotography is generally thought of as a precursor to motion pictures, which makes it seem natural to use a motion picture as the source for images used in chronophotography. Chronophotography’s original purpose was to help scientists study objects in motion, primarily humans and animals. To meaningful capture this motion required that discrete images be captured rapidly. As early as 1832, Joseph Plateau of Belgium invented the stroboscope, a device which allowed him to rotate a wheel with images on it allowing the user to see an image through a slit, giving the appearance of motion. In 1917, French engineer Etienne Oehmichen built a camera capable of shooting 1,000 frames per second and patented the first electric stroboscope. In 1931, Harold Eugene Edgerton an electrical engineer at MIT employed a flashing lamp to study machine parts in motion. This lead to several images that have become famous pieces of art, such as the milk drop coronet, a bullet frozen in air, hummingbirds in flight, or dancers in motion. With several independent images the negatives could be overlapped in the dark room to create a single still image which captures the motion (2; 3). Harold Edgerton applied this technique under numerous situation and popularized stroboscopy (4). These early images were in black and white, see Figures 1 and 2.
Figure 2. A crane landing (5)
The stroboscope has been used effectively to study the motion of objects in many systems. Images created with stroboscopes are often featured in physics texts. Such (3) images can be used to measure velocity and calculate acceleration. More modern color image created with stroboscopes are shown in Figures 3 through 5. Today digital photography has made such techniques even easier. Chronophotography has been achieved by individuals using several shots taken from a camera followed by some post-processing in Photoshop to achieve some interesting results. Chronophotographies have found a niche for capturing short moments in sports, such as wake boarding and skateboarding tricks (ie Figure 6). Special effects have further been employed such as fading the object in and out of the photograph at different moments in the motion for an artistic flavor, see Figure 7 (6).
Figure 3. The path of a bouncing basketball is captured Figure 4. A dancer’s motion is captured (4)
Figure 5. Chronophotgraphy illustrates a swing (4) Figure 6. Skate boarding trick (6)
Figure 7. Running leap with fadeout (6)
More recently, Gleicher and Liu have published a methodology for creating panoramas from web videos, see Figures 8 and 9. Their work looks at videos on the web for candidate panoramas, using criteria such as video quality (not blurry or blocky) and covering a large field of view (1). Many web videos have moving objects in them, Gleicher and Liu exploited this fact to create chronophotographic panoramas, or what they call activity synopsis synthesis. Under the assumptions of a large static background with smaller moving foreground objects homographies between pairs of images could first be made. Using the homographies to align the images, optical flow could next be used to detect moving objects, see Figure 10. Dynamic objects were removed to create a static panorama, see Figure 11. Now the dynamic objects could be selectively inserted into the panorama to create an “activity synopsis synthesis," see Figure 12. More of their results are shown in Figure 13. (1).
Figure 8. Panoramas created from videos (1) Figure 9. More panoramas created from videos (1)
(a) Optical flow thresholded
(b) Detected dynamic objects
Figure 10. A skier was identified using optical flow by Gleicher and Liu (1)
Figure 11. Panorama of skier’s background (1)
Figure 12. Activity synopsis synthesis of a skier(1)
Figure 13. Activity synopsis synthesis examples (1)
Panoramas
The work for this project began with panorama making code developed in Matlab for Homework 4, applied to videos. Frames were grabbed from the video which readily created panoramas, see Figure 14 Unfortunately this is not commercial grade software, and the reader will notice that the the images on the edges often do not look well aligned with their neighbors. In fact, they are aligned as well as the others, but accumulated error in the homographies causes drift which results in this error. So large panoramas do not look so good. Figure 15 is an example with high-resolution, high-quality images, notice the leftmost image does not look well aligned. With lower resolution frames from a video camera such errors accumulate more rapidly, see Figure 16. Figure 17 shows the source images used to create Figure 16. Some points to create the homographies had to be entered manually as the scene has a lot of sky and ocean area in which SIFT was unable to find good key points. Giving these images to Autostitch produced Figure 18 which is a nice looking panorama, but still it missed many of the keypoints and is a reduced panorama (7). Reducing the images used in creating a panorama made with our software also gives better results, see Figure 18. The panorama of Golden Gate Bridge was made using a video filmed a couple years ago with no intent of using it to make a panorama, this mirrors Gleicher and Liu’s idea of creating panoramas from web videos (1). Successfully creating a panorama from such a source shows that our method generalizes nicely.
Figure 14. Panorama of a painting at the Chazen art center
Figure 15. Panorama using still images
Figure 16. Scene of the Golden Gate Bridge, video not made for this project
Figure 17. Input images for the scene of the Golden Gate Bridge
Figure 18. Autostitch San Francisco Figure 19. Golden Gate Bridge, video not made for this project
Quality Measure
Gleicher and Liu found that their panorama making software needed high quality videos, using youtube they could search through many videos and from the highquality videos use only the highest quality frames to create a panorama (1). They used the work of Wang and Tong to detect whether images had a lot of blur or were very block and ignored such frames (8; 9). The videos we worked with were lower quality and suffered mostly from motion blur, for that reason we used the work of Narvekar to select from our low quality videos the highest quality frames (10). To give an example of how this works, consider the Matlab image processing toolboxes video ‘traffic.avi’ (11). Figure 20 is a plot of the frame quality as a function of the frames, the minimum corresponds to the worst quality frame (see Figure 21(a)) and the maximum corresponds to the highest quality frame (see Figure 21(b)). Working with quality measures allowed us to choose the best frame out of a set to use in the panorama.
Figure 20. Quality of frames, higher values correspond to higher quality
(a) Lowest quality frame (b) Highest quality frame
Figure 21. Quality of frames for ‘traffic’ video
Optical Flow
Next we wanted to find moving objects in the video. For this task we followed the path of Gleicher and Liu by using optical flow (1). Freeman’s code to calculate the optical flow between two images was used (12). The optical flow produced floating point values for the x and y components of optical flow at each pixel. The L1 or L2 norm for each pixel was found and the values compared to a user set threshold. Pixels with high velocity components (above the threshold) correspond to moving object and low velocity components correspond to static objects. Figures 22 and 23 are representative images from the ‘traffic’ video which show the steps of our algorithm to detect moving objects. In cases where the camera is moving all pixels have a velocity component, but those pixels moving more rapidly will have a higher velocity component. We found this trick can be used to remove static foreground objects when the video camera is in motion. Although the background pixels also move the foreground pixels move quicker relative to the video camera.
(a) Original video frame (b) Optical flow
(c) Detected dynamic objects (d) Detected static objects
Figure 22. Representative images from traffic video
(a) Original video frame (b) Optical flow
(c) Detected dynamic objects (d) Detected static objects
Figure 23. More representative images from traffic video
Applications
Beyond creating panoramas from videos of still images several applications were explored, such as static background extraction, activity synopsis synthesis or chronopanoramas, and static object removal. These were all shown to work well.
A. Static Background Extraction
With moving objects now detected the static background can now be separated from the dynamic foreground images. Using the median (to avoid outlier) of all of the static objects at a given location over a series of frames produces one static image, see Figure 24. If a moving object is found over part of the image it will appear as a black hole in the image. This can also be used to create still panoramas in the case where moving foreground objects have been detected, as shown in the results.
B. Activity Synopsis Synthesis
Now with a static image the algorithm can find nonoverlapping dynamic objects and use a cut and paste approach to putting them back into the video. Alternatively for more predictable results a user can choose dynamic frames that they would like to add back in, see Figure 25. When two or more dynamic objects are in a frame both will be added in. For this reason there are artistic circumstances where the user would wish to select one set of dynamic objects to be removed from a static background and a different set of dynamic objects to add into the image, see Figure 26.
Figure 24. Static background of the ‘traffic’ video
Figure 25. The ‘traffic’ video with a car’s path added back in
Figure 26. The ‘traffic’ video with a car’s path added back in
and a different car selectively removed from one frame
An alternative to using optical flow to find and select dynamic objects is to have the user select the objects with a mouse. Both methods were implemented and used for this project. Once the dynamic objects have been found many ideas can be applied to the images. Dynamic objects can be faded in and out with the static background creating interesting illusions. The dynamic objects can even be blended together to create images like those produced by the stroboscope (in practice this doesn’t work well for us, because optical flow doesn’t find tight boundaries for our dynamic object). The colors of different objects can be changed in fundamental ways (using RGB or L*a*b* for example) at different points in its motion (in practice this doesn’t work well for us, because optical flow doesn’t find tight boundaries for our dynamic object). One could use the background as a background for a game, and treat the moving objects as sprites.
C. Stroboscope Effect
One may wish to create a special kind of activity synopsis synthesis which will do a good job of mimicking Harold Edgerton’s stroboscope effect. Note that these images have dark backgrounds with a dynamic foreground image whose movement may be overlapping. This would then require a good algorithm to remove the background from the foreground, one could even use a “blue screen" for easy background subtraction. Since optical flow does not get a tight enough bound on our dynamic foreground image and since we had no “blue screen," we selected the dynamic foreground objects by hand. The results for a tennis ball are seen in the image below with three different frequencies of video frames. Since our methodology was different from a typical stroboscope, we can revert these results to more familiar looking Activity Synopsis Syntheses, see below. The most important aspect that is different between these results and our previous results are simply that a tighter bound on the object’s boundaries is required so that overlap between objects does not include the static background.
D. Static Object Removal
Many times videos are created in which part of the scene is occluded. No single photograph alone can capture the actual scene, but a series of photographs can capture the scene as though the occlusion was not there. Specifically we consider the case where the object being photographed is planar, an occlusion doesn’t allow an unobstructed view, and the video camera is moved in respect to the scene. Figures 27 and 28 illustrate a simple case of the scenario with two images of a map, in neither image can the entire map be seen. In such cases panorama stitching software may be able to combine the images to create a panorama, but the obstructions remain ghosted into the scene, see Figure 29. Using optical flow to find and remove the static objects which move faster relative to the camera than the scene in the background a panorama can be created with the static foreground object removed, see Figure 30.
Figure 27. Map with occlusion Figure 28. Map again with occlusion
Figure 29. Autostitch panorama result
Figure 30. Occlusion removal result
Source Video:
E. Removing Large Static Occlusions - "Peeking"
In some special case, one may want to capture an scene through a crack in a door or picket fence. Assuming the scene to be captured is mostly planar, or very far away, this can be achieved in much the same way that static scenes are rebuilt and static occlusions are removed. Foreground objects (the door or fence) can be found using optical flow, but we find that in this case mostly only the edges of the occlusion are detected (Figure 31).
Figure 31. Optical flow of bathroom stall
To find the occluding regions, we create a simple binary mask of all pixels below a certain user-specified intensity. This has somewhat satisfactory results in the case that most of the pixels in the background region are brighter. See Figures 32 and 33. This image was captured through the (rather large) crack of a bathroom stall. See Figures 33 and 34 for an example of what happens when a scene is too dark. Notice the dropped pixels around the edges of the blue garbage cans. This image of a hallway was taken through a crack approximately 2 to 3mm in width between two doors. In an attempt to remove artifacts, optical flow was considered while finding the regions of interest in Figures 35 and 36. This method reduced the "stripiness" of the product but unfortunately also dropped many pixels from the background region.
Figure 32. Selected pixels from chosen frames of the source video Figure 33. Resulting image
Figure 33. Selected pixels from chosen frames of the source video Figure 34. Resulting image
Figure 33. Selected pixels from chosen frames of the source video Figure 34. Resulting image
Source Videos:
Experimental Results
Several example results were created using the methods developed. Some of them follow:
Near Camp Randall
Messages such as YMCA can be spelled out by a person
Dynamic objects can be recolored, which would be visually more impressive if optical flow gave tighter boundaries on our foreground objects
A still near the stadium
A cartwheel’s motion is caught
More of the motion of the cartwheel is caught
Using a higher resolution camera gives better looking results
Objects are removed from a gym
A shot is made, at some points the ball is a little blurry
A shot is made, a user may choose which points to keep
Another shot is made, however many frames capturing the motion were not good quality
For artistic value the ball is selectively faded in and out, the ball is sharpest when leaving
the hands of the shooter and when entering the net, drawing the eyes of the viewer to these points
References
[1] F. Liu, Y. Hu, and M. L. Gleicher, “Discovering panoramas in web videos,” in Proceeding of the 16th ACM international conference on Multimedia, p. 329–338, 2008.
[2] “Chronophotography - wikipedia, the free encyclopedia.” http://en.wikipedia.org/wiki/Chronophotography.
[3] “Stroboscope - wikipedia, the free encyclopedia.” http://en.wikipedia.org/wiki/Stroboscope.
[4] “Intro to digital stroboscopic motion photography.” http://people.rit.edu/andpph/text-digitalstroboscopy.html.
[5] “Freeze frames: The moving history of chronophotography | gadgets, science & technology.” http://gajitz.com/freeze-framesthe-moving-history-of-chronophotography/.
[6] “A beginners guide to capturing motion in your photography.” http://www.digital-photographyschool.com/a-beginners-to-capturing-motion-inyour-photography.
[7] M. Brown and D. Lowe, “Automatic panoramic image stitching using invariant features,” International Journal of Computer Vision, vol. 74, no. 1, p. 59–73, 2007.
[8] H. Tong, M. Li, H. Zhang, and C. Zhang, “Blur detection for digital images using wavelet transform,” in Multimedia and Expo, 2004. ICME’04. 2004 IEEE International Conference on, vol. 1, p. 17–20, 2004.
[9] Z. Wang, H. R. Sheikh, and A. C. Bovik, “Noreference perceptual quality assessment of JPEG compressed images,” in Image Processing. 2002. Proceedings. 2002 International Conference on, vol. 1, p. I–477, 2002.
[10] N. Narvekar and L. J. Karam, “An improved noreference sharpness metric based on the probability of blur detection,” Wkshp. on Video Proc. and Quality Metrics, 2010.
[11] M. U. Guide, “The MathWorks,” Inc., Natick, MA, vol. 5, 1998.
[12] W. T. Freeman, E. H. Adelson, C. Liu, et al., Beyond pixels: exploring new representations and applications for motion analysis. PhD thesis, Massachusetts Institute of Technology, 2009.