Human Activity:
i-LIDS datasets: UK Government benchmark datasets for automated surveillance. Surveillance Performance EValuation Initiative (SPEVI) - http://www.homeoffice.gov.uk/science-research/hosdb/i-lids , http://www.eecs.qmul.ac.uk/~andrea/dwnld/avss2007/AVSS_2007_i-LIDS_challenge_briefing.pdf
PETS 2010 Benchmark Data: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. http://www.cvg.rdg.ac.uk/PETS2006/index.html, http://pets2007.net/
CANTATA project datasets: Datasets from past PETS workshops and many other sources. WallFlower dataset: For evaluating background modelling algorithms. Ground-truth foreground provided.
VISOR: Video Surveillance Online Repositiory: Lots of videos and ground truth.
Advanced Video and Signal based Surveillance: a variety datasets for tracking and detection. CAVIAR surveillance Dataset
MuHAVi: Multicamera Human Action Video DataA large body of human action video data using 8 cameras. Includes manually annotated silhouette data.
Colour video and Thermal infrared datasets: Dataset of videos in colour and thermal infrared. Videos are aligned temporally and spatially. Ground-truth for object tracking is provided. INRIA Datasets: Cars, people, horses, human actions, etc.
BEHAVE Interactions Test Case Scenarios various scenario�s of people acting out 10 types of group interactions: InGroup, Approach, WalkTogether, Split, Ignore, Following, Chase, Fight, RunTogether and Meet.
BEHAVE Crowd Sequence Data
Face Detection and Tracking
MIT Face Dataset. Videos for Head Tracking. Experiments on skin region detection and tracking: it includes a ground-truthed dataset CMU pose, illumination and expression (PIE) database A database of 41,368 images of 68 people. Each person imaged under 13 different poses, 43 different illumination conditions and with 4 different expressions.
Video database of moving faces and people. Univ. of Texas (Dallas) database for testing algorithms for face and person recognition, head/eye tracking, and computer graphics modeling of natural human motions. For each person there are nine static facial mug shots and a series of video streams. Complete data sets are available for 284 subjects and duplicate data sets, taken subsequent to the original set, are available for 229 subjects.
Pedestrians Detection and Street Scenes
Pedestrian dataset from MIT. MIT Street Scenes: CBCL StreetScenes Challenge Framework is a collection of images, annotations, software and performance measures for object detection [cars, pedestrians, bicycles, buildings, trees, skies, roads, sidewalks, and stores]
Daimler pedestrian benchmarks Large datasets (many thousands of pedestrian labels) for benchmarking pedestrian detection, classification and tracking algorithms.
Caltech pedestrian dataset Large dataset (many thousands of pedestrian labels) for bechmarking pedestrian detection, classification and tracking algorithms).
Person Re-identification
ViPER dataset
images of people from two different camera views
only one image of each person per camera.
45 degree angles
8 same viewpoint angle pairs or 28 different viewpoint angle pairs.
632 pedestrians image pairs taken by two different cameras.
most challenging datasets for automated person Re_ID.
ETHZ dataset
images of people taken by a moving camera
range of variations in person appearances.
Sequence 1 with 83 pedestrians
Sequence 2 with 35pedestrians
Sequence 3 with 28 pedestrians
i-LIDS multi- camera tracking scenario (i-LIDS MCTS )
i-LIDS MCTS dataset used for tracking
crowded public spaces.
476 images of 119 pedestrians taken from two non-overlapping cameras.
average of 4 images of each pedestrian and a minimum of 2 images.
considerable illumination variations and occlusions across the two cameras.
CAVIAR4REID :
Extract from multi-camera tracking dataset
indoor shopping mall with two cameras with overlapping views.
multiple images of 72 pedestrians,
50 appear in both cameras
22 come from the same camera
i-LIDS MA and AA from the i-LIDS MCTS dataset.
iLIDS-MA:
images of 40 pedestrians taken from two different cameras.
46 manually annotated images of each pedestrian are extracted from each camera.
3680 images of slightly different sizes.
iLIDS-AA:
images of 100 pedestrians taken from two different cameras.
automatically detected and tracked images of each pedestrian from each camera.
10,754 images of slightly different sizes.
more challenges due to the possibility of errors coming from automated detection and tracking
V47 :
videos of 47 pedestrians captured using two cameras in an indoor setting.
}two different views of each person (person walking in two different directions)
}foreground masks for every few frames of each video
QMUL underGround Re-IDentification (GRID) Dataset.
8 cameras with non-overlapping FOVs in a underground train station
low resolution ,significant illumination variations.
250 pedestrian images that appear in two different camera views
775 images of people in a single view.
SARC 3D
short video clips of 50 persons
4 predefined viewpoints captured with calibrated cameras
Useful to generate 3D body model
manually selected four frames for each clip corresponding to predefined positions and postures of the people.
200 snapshots in total
http://imagelab.ing.unimore.it/visor/sarc3d.asp
3DPES (3DPeS: 3D People Dataset for Surveillance and Forensics : http://imagelab.ing.unimore.it/visor/3dpes.asp - )
8 camera non-overlapped field of views
Data during several days.
People with strong variation of light condition
uncompressed images with a resolution of 704x576 pixels.
Different cameras position, orientation, zoom levels.
+100 persons detected more than one time in different cameras.
people re-identification, people detection, tracking, action analysis and trajectory analysis
RGB-D person Re-ID dataset - evaluation of depth-based features for Re-ID
depth information for each pedestrian using the Kinect
indoor scenario
“Collaborative"
79 people with a frontal view, walking slowly, no occlusions and stretched arms
"Walking1" - "Walking2"
same 79 people walking normally while entering the lab
"Backwards"
back view recording of the people walking away from the lab.
no guarantee that visual aspects like clothing or accessories will be kept constant.
for each person: 1) a set of 5 RGB images, 2) the foreground masks, 3) the skeletons, 4) the 3d mesh (ply), 5) the estimated floor.
Action Recognition: Fight Recognition in prisons
Hollywood 2: Different fight actions taken from movies.
http://www.di.ens.fr/~laptev/actions/hollywood2/
UT-Interaction: Dataset of 20 videos where several persons kick, punch and push each other.
http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
UCF50: There is a category “Punch” with 160 videos of boxing.
http://crcv.ucf.edu/data/UCF50.php
Hockey Fights: 500 videos of fighting and 500 videos of non-fighting videos. The sequences last 1 second. The whole dataset happens in a hockey field.
http://visilab.etsii.uclm.es/personas/oscar/FightDetection/index.html
Paper: E. Bermejo, O. Deniz, G. Bueno, R. Sukthankar. Violence Detection in Video using Computer Vision Techniques Proceedings of Computer Analysis of Images and Patterns, 2011.
INO: Only one video where several persons fight on a parking lot.