Tsukuba Multiview Image Database

- TSUKUBA STEREO -

February 17, 1995

Multiview Image Database for Computer Vision Research

Yuichi Ohta and Yuichi Nakamura

Computer Vision and Image Media Laboratory

University of Tsukuba, Tsukuba, 305-8577, Ibaraki, JAPAN


1 Purpose

Computer Vision technologies will be essential to realize a high performance 3D image media. For the purpose, however, we need faster and more reliable algorithms which can produce more precise 3D information. The usage of multiview images will be the most practical way to achieve such algorithms. However, the acquisition of images observed from spatial and temporal multiple viewpoints requires special and expensive devices. This difficulty in the data acquisition may disturb the active research in the direction.

We have developed this "Multiview Image Database" as a common and standard database which can be used for many researchers to promote the computer vision research using multiple view images.

The development of this database was partially supported by the Ministry of Education, Science and Culture in Japan. The project was directed by Prof. Yuichi Ohta under the title "Development of Digital Image Database for Computer Vision Research". Several researchers in University of Tsukuba including Dr. Yuichi Nakamura have joined the project.

The image acquisition and data organization process was performed by the contribution of graduate students of the Computer Vision and Image Media Laboratory at University of Tsukuba. Their names are as follows;

Volume 1: Kiyohide Satoh, Yasuhiro Mukaigawa, Itaru Kitahara

Volume 2: Takeshi Kurata, Kwang-ho Yang, Masashi Nishitani, Yukiyo Uehori

The CD-ROM package was designed by Toshikazu Miki, a graduate student of art, and the diorama used as an object was created by Tamako Takahashi, a undergraduate student of architecture.


2 Copyright

The copyright of this database is reserved by Yuichi Ohta and Yuichi Nakamura. The use of this database is restricted to those who agree with the following three usage restrictions.

1. The use of this database is limited to non-profit purposes.

2. Whenever the result of a research which was done by using this database is published, the author should indicate the credit for "University of Tsukuba, Multiview Image Database" in the article.

3. The user should accept full responsibility for the use of this database and the copyright holders assume no responsibility.

The copy of this database is allowed only when this copyright note is included in that copy and the user of the copy agrees with this copyright note. Without written permission from the copyright holders mentioned above, the use of this database which does not follow the above restrictions is prohibited.


Queries on the copyright should be forwarded to:

Professor Yuichi Ohta and Itaru Kitahara

Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, JAPAN


3 Contents

This database is organized with two CD-ROM volumes. The contents of each volume are roughly as follows.

When accessing for the first time, an access request is required.


Vol#1

A still object observed from 81 (= 9x9 lattice) camera positions (four scenes).

A moving object observed from 9 (= 3x3 lattice) camera positions (two scenes).

Vol#2

Moving images observing hand gestures from four camera positions, front, top, right, and left. Moving images observing the face of the gesture player are included.


----------------------------------------------------------------------------------------------------------------

4 Vol#1

4.1 Overview

The followings are the contents of the 1st volume of CD-ROM.

1. STILL IMAGE

* 81 cameras (9x9 lattice)

* 4 scenes (1:stuffed toy, 2:pot plant,3:diorama, 4:mannequin)

* 1 frames/scene (81 images/scene)

* 640x480 pixel, RGB 24bit

* PPM format (portable pixmap file format)

* COMPRESSED by GZIP (gzip -9)


2. IMAGE SEQUENCE

* 9 cameras (3x3 lattice)

* 2 scenes (1:stuffed toy, 2:pot plant)

* 28 frames/scene (252 images/scene)

* 640x480 pixel, RGB 24bit

* PPM format (portable pixmap file format)

* COMPRESSED by GZIP (gzip -9)


3. APPENDIX

* Index of each scenes (6 images)

* Overview of the studio (3 images)

* 640x480 pixel, RGB 24bit

* JPEG format (quality 75)


4.2 Setting cameras

4.2.1 STILL IMAGE (9x9 lattice)

The images of a still object are captured from multiple viewpoints located at equally spaced lattice positions. As shown in "ARM.EPS", a camera mounted on a X-Y linear robot which is controlled by a workstation is used for the image acquisition.


4.2.2 IMAGE SEQUENCE (3x3 lattice)

A still object is fixed on a turntable which is controlled by a workstation and the turntable is rotated step-by-step. At every step, images from 9 viewpoints are captured by using the X-Y linear robot.


4.3 Devices

4.3.1 Camera

SONY 3CCD Color Video Camera (XC-003).

Serial number 100240

Image device Interline-transfer, 1/3-inch CCD

Effective picture elements 768(H) x 494(V)

Sensing area 6.00(H) x 4.96(V) mm

Effective sensing area 4.876(H) x 3.655(V) mm

Signal system NTSC color system

Scanning lines 525 TV lines

Scanning mode 2:1 interlace

Scanning frequency Horiz.15.734 kHz, Vertical 59.94 Hz

Horizontal resolution 570 TV lines

Vertical effective lines 485 lines

Sensitivity 2000 lx (F5.6)

Video S/N 59 dB

Video output R/G/B:0.7Vp-p(75ohm)

Gain 0 dB

Color temperature 3200 K

Electronic shutter OFF

Charge accumulation frame

Gamma offset ON (gamma = 0.45)

Contour enhancement OFF

Lens mount C mount


4.3.2 Lens

Nikon Cine-Nikkor 10mm F1.8

Serial number L746065

Focal length 10mm

Iris 1.8-

Length between two principal points 35.45mm

Distance between mount and CCD 17.526mm

Distance between mount and 2nd principal point (focus=infinity) backward 7.526mm

Distance between mount and 1st principal point (focus=infinity) forward 27.924mm

Field angle horizontal angle 27.40 degree (using the above camera, focus=infinity) vertical angle 20.71 degree


4.3.3 Arm Robot (for controlling cameras)

Nihon-Seiko Robot Module XY-HD10087-A01-001

Combined 2 linear slide stages, X-horizontal and Y-vertical

(X-Y plane is parallel to the CCD plane of the camera)

Positional precision +- 0.01mm


4.3.4 Turn table (for turning objects)

Nihon-Seiko Mega Torque Motor M-YS3040FN501

Rotational precision +- 2.1sec.

Diameter of table 60cm


4.3.5 Video Capture (Digitizing)

Parallax Graphics X-Video.

Model number XV-24SVC-RGB

Serial number XV-24SV-15965, VIO/RGB-15741

Video input RGB

Digitizer RBG,24 bits per pixel

Workstation Sun SPARCstation IPC


4.3.6 Light

SONY HVL-150 (halogen lamp, 150W)

Number of lights 5

Lighting Condition indirect lighting reflected with white sheets, no ambient light


4.3.7 Background

Cloth with texture (distance from the camera : 184cm)


4.4 Still object from 81 view positions

4.4.1 File Names

A still object is observed from 9 by 9 (81) view positions located on the X-Y plane. The vertical and the horizontal intervals between the view positions are set to a constant for each scene. The constant is called baseline length b. The principal ray of the camera at every position is perpendicular to the X-Y plane, i.e., no convergence. Each file is named according to the following rule.

<NAME (3 or 4 characters)>_<X (1 digit)>_<Y (1 digit)>

where NAME is one of {SANT(stuffed toy),PLNT(pot plant),CITY(diorama), KID(mannequin)}, and X and Y indicate the coordinates of the camera.

0 1 2 3 4 5 6 7 8

----------------------------------------> X

| (0,0) (8,0)

0 | x x x x x x x x x

|

1 | x x x x x x x x x

|

2 | x x x x x x x x x

|

3 | x x x<->x x x x x x

| b |

4 | x x x x x x x x x

|

5 | x x x x x x x x x <-- e.g. _X_Y=_8_5

|

6 | x x x x x x x x x

|

7 | x x x x x x x x x

|

8 | x x x x x x x x x

v (0,8) (8,8)

Y

(view from camera to object)


4.4.2 Parameters for each scene

A. Stuffed toy

File name: SANT_0_0 - SANT_8_8

Baseline length (b): X-axis 20mm / Y-axis 20mm

Convergence: none

Distance from the CCD: Right hand 59cm

Nose 72cm

Left hand 80cm

Focus mark of lens: 60cm

Iris mark of lens: 4


B. Pot plant

File name: PLNT_0_0 - PLNT_8_8

Baseline length (b): X-axis 20mm / Y-axis 20mm

Convergence: none

Distance from the CCD: Front red flower 59cm

Front blue flower 68cm

Central high plant 77cm

Back blue flower 92cm

Focus mark of lens: 50cm

Iris mark of lens: 4


C. Diorama

File name: CITY_0_0 - CITY_8_8

Baseline length (b): X-axis 8mm / Y-axis 8mm

Convergence: none

Distance from the CCD: Front building 33cm

Right corner of high building 66cm

Focus mark of lens: slightly less than 40cm

Iris mark of lens: 4


D. Mannequin

File name: KID_0_0 - KID_8_8

Baseline length (b): X-axis 8mm / Y-axis 8mm

Convergence: none

Distance from the CCD: Nose 50cm

Throat 57cm

Ear 63cm

Focus mark of lens: 50cm

Iris mark of lens: 4


4.5 Moving object from 9 view positions


4.5.1 File Names

An object is fixed on the turntable which is rotated step-by-step. At each step, images are captured from 3 by 3 nine view positions located on the X-Y plane. 28 steps are repeated for each object to produce a set of image sequence with 28 frames from every 9 view positions. The vertical and the horizontal intervals between the view positions are set to a constant for each scene. The constant is called baseline length b. The principal ray of the camera at every position is perpendicular to the X-Y plane, i.e., no convergence.

Each file is named according to the following rule.

<NAME (2 characters)><FRAME (2 digits)>_<X (1 digit)>_<Y (1 digit)>

where NAME is one of {SA(stuffed toy),PL(pot plant)}, FRAME is the frame number, and X and Y indicate the coordinates of the camera.

FRAME = 00 FRAME = 01 .... FRAME = 27

0 1 2 0 1 2

-----------> X -----------> X

|(0,0) (2,0) |(0,0) (2,0)

0 | x x x 0 | x x x <-- e.g. 01_2_0

| |

1 | x x x 1 | x x x .....

| |

2 | x x x 2 | x x x

v(0,2) (2,2) v(0,2) (2,2)

Y Y

(view from camera to object)


4.5.2 Parameters for each scene

A. Stuffed toy

File name: SA00_0_0 - SA27_2_2

Baseline length (b): X-axis 40mm / Y-axis 40mm

Convergence: none

Rotation step: 1.0 deg/frame

Rotation axis: center of the head

Distance from the CCD (first frame): Right hand 59cm

Nose 72cm

Left hand (Maximum) 80cm

Focus mark of lens: 60cm

Iris mark of lens: 4

Note: The relative location of the object and the camera at the first frame is identical to that of the STILL IMAGE data from 81 view positions.


B. Pot plant

File name: PL00_0_0 - PL27_2_2

Baseline length (b): X-axis 20mm / Y-axis 20mm

Convergence: none

Rotation step: -1.0 deg/frame

Rotation axis: near the central high plant

Distance from the CCD (first frame): Front red flower 83cm

Front blue flower 92cm

Central high plant 101cm

Back blue flower 116cm

Focus mark of lens: 60cm

Iris mark of lens: 4

Note: The relative location of the object and the camera at the first frame is almost identical to the STILL IMAGE from 81 view positions except the distance between them.


4.6 Directories

The following is the tree structure of directories in Vol.1

Vol.1--+--- README: brief explanation

|

+--- DOC: documents

|

+--- IMAGE: data

| |

| +---- STILL: still images

| | |

| | +---- SANT: stuffed toy

| | +---- PLNT: pot plant

| | +---- CITY: diorama

| | +---- KID: mannequin

| |

| +---- SEQUENCE: image sequences

| |

| +---- SANT: stuffed toy

| +---- PLNT: pot plant

|

+---- APPENDIX: appendix

|

+---- SEQUENCE.JPG: an index image of image sequences

+---- STILL.JPG: an index image of still images

+---- SANT.JPG: an index image of stuffed toy

+---- PLNT.JPG: an index image of pot plant

+---- CITY.JPG: an index image of diorama

+---- KID.JPG: an index image of mannequin

+---- STUDIO1.JPG: an overview of studio settings

+---- STUDIO2.JPG: another view of our studio

+---- STUDIO3.JPG: an overview of stuff and staff


----------------------------------------------------------------------------------------------------------------

5 Vol#2

5.1 Overview

The followings are the contents of 2nd volume of CD-ROM.

1. IMAGE SEQUENCE (HUMAN MOTION)

* 5 CAMERAS (5 images/frame)

* 4 scenes of human gestures (1:expanding, 2:pointing, 3:rotating, 4:removing a cap)

* 30-60 frames/scene (150-300 images/scene)

* with electronic shutter (1/250 sec)

* 640 x 480 pixel, RGB 24bit

* PPM format, COMPRESSED by GZIP (gzip -9)


2. STILL IMAGE (HUMAN GESTURE)

* 4 CAMERAS (4 images/scene)

* 30 scenes of human gestures

* 1 frame/scene (120 images)

* without electronic shutter

* 640 x 480 pixel, RGB 24bit

* PPM format, COMPRESSED by GZIP (gzip -9)


3. APPENDIX

* Overview of each scene (indices of all scenes)

* Overview of the studio

* 640 x 480 pixel, RGB 24bit

* JPEG format (quality 75)


5.2 Studio Settings (Geometry)

5 cameras are located as shown in "CAMERA.EPS". TOP, FRONT, RIGHT cameras are located perpendicular to each other. LEFT camera is located opposite to RIGHT. HEAD camera is prepared for taking images of a face in order to get lip motion and so on.

Note that this configuration is not accurate. We need further calibration of parameters for geometrical computations. The data for calibration (reference cube; 30x30x30cm, on which a lattice with 6cm interval is drawn) is added for each of "image sequences" and "still images" separately (We took "image sequences" and "still images" on different date on different conditions!).


5.3 Devices

5.3.1 Cameras

SONY 3CCD Color Video Camera (XC-003).

1. specifications: already shown above

2. settings in taking images:

color temperature: 3200K

gamma offset: ON (gamma = 0.45)

electronic shutter: ON (1/250sec) for image sequence, OFF for still image


5.3.2 Lens

1. Canon JF7.5 1.4: (FRONT, LEFT, RIGHT, and TOP camera)

focal length: 7.5mm

field angle: horizontal angle 36.02 degree, vertical angl e 27.39 degree

iris: 1.4-

settings in taking images:

iris = 1.4

focused_point_distance = 0.3m(LEFT,RIGHT), 0.4m(FRONT,TOP)


2. Nikon Cine-Nikkor 25mm F1.4 (HEAD camera)

focal length: 25mm

field angle: horizontal angle 11.14 degree, vertical angle 8.36 degree

iris: 1.4-

settings in taking images:

iris = 1.4

focused_point_distance = 1.5m


5.3.3 Light

Light: SONY HVL-150 (halogen lamp, 150W) x 5

Lighting Condition:

image sequence: direct lighting (by 3 lights) and indirect lighting (by 2 lights) reflected by white sheets.

still image: indirect lighting reflected by white sheets


5.3.4 Video Capture (Digitizing)

Parallax Graphics X-Video. The specifications are shown above.


5.3.5 Recording Devices

The recording devices are listed in the followings (All of them are for NTSC format)

Front Image: Write-Once Video Disk Recorder (SONY LVR-3000N),input:RGB (NTSC)

Left(Right) Image: Beta-CAM Video Recorder (SONY UVW-400), input:RGB (NTSC)

Top Image: MII Video Recorder (Victor KR-M800), input:RGB (NTSC)

Head Image: Hi-8 Video Recorder (SONY EVO-9800A), input:S(Y/C) (NTSC)


5.4 Interlace

Images are taken in NTSC (with interlace) format, so that each field is taken at every 1/60sec, and two of them are combined into a frame at every 1/30sec. In digitizing, an image is taken for every frame, and two fields of the frame are split - The first field image is placed in upper half of each image, and the second is placed in lower half. In this sense, each field image is half in height. However, it is not shortened in its width, since our policy is that we keep the resolution as fine as possible.


5.5 Files and Directories

5.5.1 File Names

Each file is named according to the following rule.

<NAME (4 characters)>_<POS (1 character)><FRAME (2 digits)>

where NAME is a name of still image or image sequence, POS shows the position of the camera which is one of {F(ront), H(ead), L(eft), R(ight), T(op)} , and FRAME is the order of a frame. For example, ROTZ_H02 means "Head Camera Frame No.2" of "Rotation" image sequence.


The following is the tree structure of directories in Vol.2

Vol.2--+--- README: brief explanation

|

+--- DOC: documents

|

+--- IMAGE: data

| |

| +---- STILL: still images

| | |

| | +---- CUBE: data for calibration

| | +---- SIGN: 30 kinds of human gestures

| | |

| | +---- FRONT: views from FRONT camera

| | +---- LEFT: views from LEFT camera

| | +---- RIGHT: views from RIGHT camera

| | +---- TOP: views from TOP camera

| |

| +---- SEQUENCE: image sequences

| |

| +---- CUBE: data for calibration

| +---- EXPAND: gesture of expansion by right hand (1.3sec)

| | |

| | +---- FRONT: views from FRONT camera

| | +---- HEAD: face views from HEAD camera

| | +---- LEFT: views from LEFT camera

| | +---- RIGHT: views from RIGHT camera

| | +---- TOP: views from TOP camera

| |

| +---- POINT: gesture of pointing by right hand (1sec)

| | |

| | +---- FRONT

| | +---- HEAD

| | +---- LEFT

| | +---- RIGHT

| | +---- TOP

| |

| +---- ROTATION: rotating right hand around z-axis (1sec)

| | |

| | +---- FRONT

| | +---- HEAD

| | +---- LEFT

| | +---- RIGHT

| | +---- TOP

| |

| +---- UNCAP: gesture of removing a cap from something

| | (2sec)

| +---- FRONT

| +---- HEAD

| +---- LEFT

| +---- RIGHT

| +---- TOP

|

|

+---- APPENDIX: appendix

|

+---- SEQUENCE.JPG: an index image for image sequences

+---- STILL.JPG: an index image for still images

+---- STUDIO1.JPG: an overview of studio settings

+---- STUDIO2.JPG: another view of our studio

+---- STAFF.JPG: an image of staff