Tsukuba Multiview Image Database
- TSUKUBA STEREO -
February 17, 1995
Multiview Image Database for Computer Vision Research
Yuichi Ohta and Yuichi Nakamura
Computer Vision and Image Media Laboratory
University of Tsukuba, Tsukuba, 305-8577, Ibaraki, JAPAN
1 Purpose
Computer Vision technologies will be essential to realize a high performance 3D image media. For the purpose, however, we need faster and more reliable algorithms which can produce more precise 3D information. The usage of multiview images will be the most practical way to achieve such algorithms. However, the acquisition of images observed from spatial and temporal multiple viewpoints requires special and expensive devices. This difficulty in the data acquisition may disturb the active research in the direction.
We have developed this "Multiview Image Database" as a common and standard database which can be used for many researchers to promote the computer vision research using multiple view images.
The development of this database was partially supported by the Ministry of Education, Science and Culture in Japan. The project was directed by Prof. Yuichi Ohta under the title "Development of Digital Image Database for Computer Vision Research". Several researchers in University of Tsukuba including Dr. Yuichi Nakamura have joined the project.
The image acquisition and data organization process was performed by the contribution of graduate students of the Computer Vision and Image Media Laboratory at University of Tsukuba. Their names are as follows;
Volume 1: Kiyohide Satoh, Yasuhiro Mukaigawa, Itaru Kitahara
Volume 2: Takeshi Kurata, Kwang-ho Yang, Masashi Nishitani, Yukiyo Uehori
The CD-ROM package was designed by Toshikazu Miki, a graduate student of art, and the diorama used as an object was created by Tamako Takahashi, a undergraduate student of architecture.
2 Copyright
The copyright of this database is reserved by Yuichi Ohta and Yuichi Nakamura. The use of this database is restricted to those who agree with the following three usage restrictions.
1. The use of this database is limited to non-profit purposes.
2. Whenever the result of a research which was done by using this database is published, the author should indicate the credit for "University of Tsukuba, Multiview Image Database" in the article.
3. The user should accept full responsibility for the use of this database and the copyright holders assume no responsibility.
The copy of this database is allowed only when this copyright note is included in that copy and the user of the copy agrees with this copyright note. Without written permission from the copyright holders mentioned above, the use of this database which does not follow the above restrictions is prohibited.
Queries on the copyright should be forwarded to:
Professor Yuichi Ohta and Itaru Kitahara
Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, JAPAN
3 Contents
This database is organized with two CD-ROM volumes. The contents of each volume are roughly as follows.
When accessing for the first time, an access request is required.
A still object observed from 81 (= 9x9 lattice) camera positions (four scenes).
A moving object observed from 9 (= 3x3 lattice) camera positions (two scenes).
Moving images observing hand gestures from four camera positions, front, top, right, and left. Moving images observing the face of the gesture player are included.
----------------------------------------------------------------------------------------------------------------
4 Vol#1
4.1 Overview
The followings are the contents of the 1st volume of CD-ROM.
1. STILL IMAGE
* 81 cameras (9x9 lattice)
* 4 scenes (1:stuffed toy, 2:pot plant,3:diorama, 4:mannequin)
* 1 frames/scene (81 images/scene)
* 640x480 pixel, RGB 24bit
* PPM format (portable pixmap file format)
* COMPRESSED by GZIP (gzip -9)
2. IMAGE SEQUENCE
* 9 cameras (3x3 lattice)
* 2 scenes (1:stuffed toy, 2:pot plant)
* 28 frames/scene (252 images/scene)
* 640x480 pixel, RGB 24bit
* PPM format (portable pixmap file format)
* COMPRESSED by GZIP (gzip -9)
3. APPENDIX
* Index of each scenes (6 images)
* Overview of the studio (3 images)
* 640x480 pixel, RGB 24bit
* JPEG format (quality 75)
4.2 Setting cameras
4.2.1 STILL IMAGE (9x9 lattice)
The images of a still object are captured from multiple viewpoints located at equally spaced lattice positions. As shown in "ARM.EPS", a camera mounted on a X-Y linear robot which is controlled by a workstation is used for the image acquisition.
4.2.2 IMAGE SEQUENCE (3x3 lattice)
A still object is fixed on a turntable which is controlled by a workstation and the turntable is rotated step-by-step. At every step, images from 9 viewpoints are captured by using the X-Y linear robot.
4.3 Devices
4.3.1 Camera
SONY 3CCD Color Video Camera (XC-003).
Serial number 100240
Image device Interline-transfer, 1/3-inch CCD
Effective picture elements 768(H) x 494(V)
Sensing area 6.00(H) x 4.96(V) mm
Effective sensing area 4.876(H) x 3.655(V) mm
Signal system NTSC color system
Scanning lines 525 TV lines
Scanning mode 2:1 interlace
Scanning frequency Horiz.15.734 kHz, Vertical 59.94 Hz
Horizontal resolution 570 TV lines
Vertical effective lines 485 lines
Sensitivity 2000 lx (F5.6)
Video S/N 59 dB
Video output R/G/B:0.7Vp-p(75ohm)
Gain 0 dB
Color temperature 3200 K
Electronic shutter OFF
Charge accumulation frame
Gamma offset ON (gamma = 0.45)
Contour enhancement OFF
Lens mount C mount
4.3.2 Lens
Nikon Cine-Nikkor 10mm F1.8
Serial number L746065
Focal length 10mm
Iris 1.8-
Length between two principal points 35.45mm
Distance between mount and CCD 17.526mm
Distance between mount and 2nd principal point (focus=infinity) backward 7.526mm
Distance between mount and 1st principal point (focus=infinity) forward 27.924mm
Field angle horizontal angle 27.40 degree (using the above camera, focus=infinity) vertical angle 20.71 degree
4.3.3 Arm Robot (for controlling cameras)
Nihon-Seiko Robot Module XY-HD10087-A01-001
Combined 2 linear slide stages, X-horizontal and Y-vertical
(X-Y plane is parallel to the CCD plane of the camera)
Positional precision +- 0.01mm
4.3.4 Turn table (for turning objects)
Nihon-Seiko Mega Torque Motor M-YS3040FN501
Rotational precision +- 2.1sec.
Diameter of table 60cm
4.3.5 Video Capture (Digitizing)
Parallax Graphics X-Video.
Model number XV-24SVC-RGB
Serial number XV-24SV-15965, VIO/RGB-15741
Video input RGB
Digitizer RBG,24 bits per pixel
Workstation Sun SPARCstation IPC
4.3.6 Light
SONY HVL-150 (halogen lamp, 150W)
Number of lights 5
Lighting Condition indirect lighting reflected with white sheets, no ambient light
4.3.7 Background
Cloth with texture (distance from the camera : 184cm)
4.4 Still object from 81 view positions
4.4.1 File Names
A still object is observed from 9 by 9 (81) view positions located on the X-Y plane. The vertical and the horizontal intervals between the view positions are set to a constant for each scene. The constant is called baseline length b. The principal ray of the camera at every position is perpendicular to the X-Y plane, i.e., no convergence. Each file is named according to the following rule.
<NAME (3 or 4 characters)>_<X (1 digit)>_<Y (1 digit)>
where NAME is one of {SANT(stuffed toy),PLNT(pot plant),CITY(diorama), KID(mannequin)}, and X and Y indicate the coordinates of the camera.
0 1 2 3 4 5 6 7 8
----------------------------------------> X
| (0,0) (8,0)
0 | x x x x x x x x x
|
1 | x x x x x x x x x
|
2 | x x x x x x x x x
|
3 | x x x<->x x x x x x
| b |
4 | x x x x x x x x x
|
5 | x x x x x x x x x <-- e.g. _X_Y=_8_5
|
6 | x x x x x x x x x
|
7 | x x x x x x x x x
|
8 | x x x x x x x x x
v (0,8) (8,8)
Y
(view from camera to object)
4.4.2 Parameters for each scene
A. Stuffed toy
File name: SANT_0_0 - SANT_8_8
Baseline length (b): X-axis 20mm / Y-axis 20mm
Convergence: none
Distance from the CCD: Right hand 59cm
Nose 72cm
Left hand 80cm
Focus mark of lens: 60cm
Iris mark of lens: 4
B. Pot plant
File name: PLNT_0_0 - PLNT_8_8
Baseline length (b): X-axis 20mm / Y-axis 20mm
Convergence: none
Distance from the CCD: Front red flower 59cm
Front blue flower 68cm
Central high plant 77cm
Back blue flower 92cm
Focus mark of lens: 50cm
Iris mark of lens: 4
C. Diorama
File name: CITY_0_0 - CITY_8_8
Baseline length (b): X-axis 8mm / Y-axis 8mm
Convergence: none
Distance from the CCD: Front building 33cm
Right corner of high building 66cm
Focus mark of lens: slightly less than 40cm
Iris mark of lens: 4
D. Mannequin
File name: KID_0_0 - KID_8_8
Baseline length (b): X-axis 8mm / Y-axis 8mm
Convergence: none
Distance from the CCD: Nose 50cm
Throat 57cm
Ear 63cm
Focus mark of lens: 50cm
Iris mark of lens: 4
4.5 Moving object from 9 view positions
4.5.1 File Names
An object is fixed on the turntable which is rotated step-by-step. At each step, images are captured from 3 by 3 nine view positions located on the X-Y plane. 28 steps are repeated for each object to produce a set of image sequence with 28 frames from every 9 view positions. The vertical and the horizontal intervals between the view positions are set to a constant for each scene. The constant is called baseline length b. The principal ray of the camera at every position is perpendicular to the X-Y plane, i.e., no convergence.
Each file is named according to the following rule.
<NAME (2 characters)><FRAME (2 digits)>_<X (1 digit)>_<Y (1 digit)>
where NAME is one of {SA(stuffed toy),PL(pot plant)}, FRAME is the frame number, and X and Y indicate the coordinates of the camera.
FRAME = 00 FRAME = 01 .... FRAME = 27
0 1 2 0 1 2
-----------> X -----------> X
|(0,0) (2,0) |(0,0) (2,0)
0 | x x x 0 | x x x <-- e.g. 01_2_0
| |
1 | x x x 1 | x x x .....
| |
2 | x x x 2 | x x x
v(0,2) (2,2) v(0,2) (2,2)
Y Y
(view from camera to object)
4.5.2 Parameters for each scene
A. Stuffed toy
File name: SA00_0_0 - SA27_2_2
Baseline length (b): X-axis 40mm / Y-axis 40mm
Convergence: none
Rotation step: 1.0 deg/frame
Rotation axis: center of the head
Distance from the CCD (first frame): Right hand 59cm
Nose 72cm
Left hand (Maximum) 80cm
Focus mark of lens: 60cm
Iris mark of lens: 4
Note: The relative location of the object and the camera at the first frame is identical to that of the STILL IMAGE data from 81 view positions.
B. Pot plant
File name: PL00_0_0 - PL27_2_2
Baseline length (b): X-axis 20mm / Y-axis 20mm
Convergence: none
Rotation step: -1.0 deg/frame
Rotation axis: near the central high plant
Distance from the CCD (first frame): Front red flower 83cm
Front blue flower 92cm
Central high plant 101cm
Back blue flower 116cm
Focus mark of lens: 60cm
Iris mark of lens: 4
Note: The relative location of the object and the camera at the first frame is almost identical to the STILL IMAGE from 81 view positions except the distance between them.
4.6 Directories
The following is the tree structure of directories in Vol.1
Vol.1--+--- README: brief explanation
|
+--- DOC: documents
|
+--- IMAGE: data
| |
| +---- STILL: still images
| | |
| | +---- SANT: stuffed toy
| | +---- PLNT: pot plant
| | +---- CITY: diorama
| | +---- KID: mannequin
| |
| +---- SEQUENCE: image sequences
| |
| +---- SANT: stuffed toy
| +---- PLNT: pot plant
|
+---- APPENDIX: appendix
|
+---- SEQUENCE.JPG: an index image of image sequences
+---- STILL.JPG: an index image of still images
+---- SANT.JPG: an index image of stuffed toy
+---- PLNT.JPG: an index image of pot plant
+---- CITY.JPG: an index image of diorama
+---- KID.JPG: an index image of mannequin
+---- STUDIO1.JPG: an overview of studio settings
+---- STUDIO2.JPG: another view of our studio
+---- STUDIO3.JPG: an overview of stuff and staff
----------------------------------------------------------------------------------------------------------------
5 Vol#2
5.1 Overview
The followings are the contents of 2nd volume of CD-ROM.
1. IMAGE SEQUENCE (HUMAN MOTION)
* 5 CAMERAS (5 images/frame)
* 4 scenes of human gestures (1:expanding, 2:pointing, 3:rotating, 4:removing a cap)
* 30-60 frames/scene (150-300 images/scene)
* with electronic shutter (1/250 sec)
* 640 x 480 pixel, RGB 24bit
* PPM format, COMPRESSED by GZIP (gzip -9)
2. STILL IMAGE (HUMAN GESTURE)
* 4 CAMERAS (4 images/scene)
* 30 scenes of human gestures
* 1 frame/scene (120 images)
* without electronic shutter
* 640 x 480 pixel, RGB 24bit
* PPM format, COMPRESSED by GZIP (gzip -9)
3. APPENDIX
* Overview of each scene (indices of all scenes)
* Overview of the studio
* 640 x 480 pixel, RGB 24bit
* JPEG format (quality 75)
5.2 Studio Settings (Geometry)
5 cameras are located as shown in "CAMERA.EPS". TOP, FRONT, RIGHT cameras are located perpendicular to each other. LEFT camera is located opposite to RIGHT. HEAD camera is prepared for taking images of a face in order to get lip motion and so on.
Note that this configuration is not accurate. We need further calibration of parameters for geometrical computations. The data for calibration (reference cube; 30x30x30cm, on which a lattice with 6cm interval is drawn) is added for each of "image sequences" and "still images" separately (We took "image sequences" and "still images" on different date on different conditions!).
5.3 Devices
5.3.1 Cameras
SONY 3CCD Color Video Camera (XC-003).
1. specifications: already shown above
2. settings in taking images:
color temperature: 3200K
gamma offset: ON (gamma = 0.45)
electronic shutter: ON (1/250sec) for image sequence, OFF for still image
5.3.2 Lens
1. Canon JF7.5 1.4: (FRONT, LEFT, RIGHT, and TOP camera)
focal length: 7.5mm
field angle: horizontal angle 36.02 degree, vertical angl e 27.39 degree
iris: 1.4-
settings in taking images:
iris = 1.4
focused_point_distance = 0.3m(LEFT,RIGHT), 0.4m(FRONT,TOP)
2. Nikon Cine-Nikkor 25mm F1.4 (HEAD camera)
focal length: 25mm
field angle: horizontal angle 11.14 degree, vertical angle 8.36 degree
iris: 1.4-
settings in taking images:
iris = 1.4
focused_point_distance = 1.5m
5.3.3 Light
Light: SONY HVL-150 (halogen lamp, 150W) x 5
Lighting Condition:
image sequence: direct lighting (by 3 lights) and indirect lighting (by 2 lights) reflected by white sheets.
still image: indirect lighting reflected by white sheets
5.3.4 Video Capture (Digitizing)
Parallax Graphics X-Video. The specifications are shown above.
5.3.5 Recording Devices
The recording devices are listed in the followings (All of them are for NTSC format)
Front Image: Write-Once Video Disk Recorder (SONY LVR-3000N),input:RGB (NTSC)
Left(Right) Image: Beta-CAM Video Recorder (SONY UVW-400), input:RGB (NTSC)
Top Image: MII Video Recorder (Victor KR-M800), input:RGB (NTSC)
Head Image: Hi-8 Video Recorder (SONY EVO-9800A), input:S(Y/C) (NTSC)
5.4 Interlace
Images are taken in NTSC (with interlace) format, so that each field is taken at every 1/60sec, and two of them are combined into a frame at every 1/30sec. In digitizing, an image is taken for every frame, and two fields of the frame are split - The first field image is placed in upper half of each image, and the second is placed in lower half. In this sense, each field image is half in height. However, it is not shortened in its width, since our policy is that we keep the resolution as fine as possible.
5.5 Files and Directories
5.5.1 File Names
Each file is named according to the following rule.
<NAME (4 characters)>_<POS (1 character)><FRAME (2 digits)>
where NAME is a name of still image or image sequence, POS shows the position of the camera which is one of {F(ront), H(ead), L(eft), R(ight), T(op)} , and FRAME is the order of a frame. For example, ROTZ_H02 means "Head Camera Frame No.2" of "Rotation" image sequence.
The following is the tree structure of directories in Vol.2
Vol.2--+--- README: brief explanation
|
+--- DOC: documents
|
+--- IMAGE: data
| |
| +---- STILL: still images
| | |
| | +---- CUBE: data for calibration
| | +---- SIGN: 30 kinds of human gestures
| | |
| | +---- FRONT: views from FRONT camera
| | +---- LEFT: views from LEFT camera
| | +---- RIGHT: views from RIGHT camera
| | +---- TOP: views from TOP camera
| |
| +---- SEQUENCE: image sequences
| |
| +---- CUBE: data for calibration
| +---- EXPAND: gesture of expansion by right hand (1.3sec)
| | |
| | +---- FRONT: views from FRONT camera
| | +---- HEAD: face views from HEAD camera
| | +---- LEFT: views from LEFT camera
| | +---- RIGHT: views from RIGHT camera
| | +---- TOP: views from TOP camera
| |
| +---- POINT: gesture of pointing by right hand (1sec)
| | |
| | +---- FRONT
| | +---- HEAD
| | +---- LEFT
| | +---- RIGHT
| | +---- TOP
| |
| +---- ROTATION: rotating right hand around z-axis (1sec)
| | |
| | +---- FRONT
| | +---- HEAD
| | +---- LEFT
| | +---- RIGHT
| | +---- TOP
| |
| +---- UNCAP: gesture of removing a cap from something
| | (2sec)
| +---- FRONT
| +---- HEAD
| +---- LEFT
| +---- RIGHT
| +---- TOP
|
|
+---- APPENDIX: appendix
|
+---- SEQUENCE.JPG: an index image for image sequences
+---- STILL.JPG: an index image for still images
+---- STUDIO1.JPG: an overview of studio settings
+---- STUDIO2.JPG: another view of our studio
+---- STAFF.JPG: an image of staff