[Code and Data] Multi-Object Tracking for Nursing Home Surveillance Video

This is the code/data page for multiple papers:
1. arXiv:1604.07468Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization 
2. CVPR16 paperThe Solution Path Algorithm for Identity-Aware Multi-Object Tracking

Download source code here (8MB, updated 2016/9/5).
Download data set here (874MB, updated 2016/6/22). Please untar it into the root directory of the source code, i.e. cvpr16_release_v1/datasets should exist.

Before running anything, please run "sh algorithms/utils/get_CLEAR-MOT.sh" to download the necessary evaluation code.
Make sure you cite [1].

demo_all.m will compile the necessary code, get the necessary code from external githubs, and run
tracking on the three data sets provided: terrace1, nursing home short and nursing home long.
Tracking for the nursing home long sequence will take around 1 day with 12
cores. Results will be written to workingdir/*.res

Three methods are provided: the solution path algorithm, our slightly modified method from CVPR 2013 (NND) and our method from [4] called NMO.
See algorithms/ for more details.

Two features are provided: the color histograms used in the CVPR 2016 paper, and deep features trained after
the paper was submitted. For more details on the deep features, please see Chapter 4 of http://www.cs.cmu.edu/~iyu/docs/dissertation.pdf
Features are in datasets/.
The deep features are included for completeness, as they perform better than color features.
The deep features have the word "deep" in the file names, e.g. nursing_home_short_deep.mat.
For qualitative performance of deep features on terrace1, see: https://www.youtube.com/watch?v=YhW7YRJvQpg

Due to privacy issues, the raw videos of the two nursing home sequences cannot be released.
Snippets of video with faces blurred out are included in datasets/nursing_home_short_videos for qualitative analysis.
If you want to generate visualizations, set visualize=1 in demo_all.m, and make sure you update the OpenCV paths in
algorithms/utils/visualization/Makefile, otherwise it won't compile. The OpenCV I used was 2.3, but I believe newer
versions should work fine too.
To make sure your OpenCV paths are set correctly, try running system('algorithms/utils/visualization/drawpd') in matlab
after compilation and make sure matlab does not complain about missing libraries.
Videos are written to workingdir/nursing_home_short_sp_color.out.res_traj.mat_dir/nursing_home_short*.avi.
For the visualization of other configurations, change the directory name accordingly.

Results (and video) from my previous runs are here: datasets/previous_output
The results might be slightly different for each run because there is some randomness in the Solution Path Algorithm.
The runs reported in CVPR 2016 are:
Solution path + color histogram results: *_sp_color.out.res
Solution path + color histogram tracking results: nursing_home_short_sp_color.out.res_traj.mat_dir
NND + color histograms results: *_nnd_color.out.res
NND + color histograms tracking results: nursing_home_short_nnd_color.out.res_traj.mat_dir

====Some more details====

0. Requirements
  a. matlab 2011 or newer
  b. g++ compiler
  c. OpenCV
  This package has only been tested in a linux environment.
1. 3 data sets:
  a. terrace1: This data set is from EPFL (http://cvlab.epfl.ch/data/pom/).
               Please cite [2] if you use this data set.
               Included in this pack are the person detection output.
               The contents of 'datasets/terrace1.mat' are:
                 pd: person detection output, each row is a person detection.
                     The first 10 numbers are: 
                       camera id
                       bounding box top left x
                       bounding box top left y
                       bounding box bottom right x
                       boudning box  bottom right y
                       3D location x
                       3D location y, 
                       3D location z
                     The remaining 252 numbers are the color histogram (or deep features)
                     describing the person detection. The color histogram
                     is computed in HSV space split into 6x6x1 bins.
                     2 layer spatial pyramid on the vertical axis was used, 
                     so there are 1 + 2 + 4 = 7 partitions. 7 * 6 * 6 = 252.
                 Y:  face recognition information. Has the same number of rows
                     as pd. If Y(i, j) = 1, that means pd(i, :) belongs to 
                     person j. For terrace1, start and end points are manually
                     added to make sure each person has at least 1 label.
                 para_sp: parameters for the solution path algorithm.
                        k: number of nearest neighbors
                        lookaround: how far in time to look for nearest 
                                    neighbors. Denoted as "T" in the paper.
                        max_velociy: max velocity possible for a person.
                                    Denoted as "V" in the paper.
                        too_far_not_same_person: modeling error in person
                                    localization. Denoted as "\delta" in paper.
                        exclusion_range: How far in time to look for person 
                                    detections which cannot possibly be from
                                    the same individual.
                        pd_per_frame: models the density of person detections.
                                    For terrace1, the detector was run on every
                                    frame, so pd_per_frame is set to 1. For
                                    nursing home sequences, person detector was
                                    run every 6 frames, so pd_per_frame = 1/6.
                        max_dist_same: if two person detections are closer
                                    than this value, and less than 6 frames 
                                    away, they are likely to be the same 
                                    person. Used for computing matrix K.
                        smooth_weight: adjusting the weight between Laplacian
                                    matrices L & K.
                 fps: frames-per-seconds for this sequence
                 gt: ground truth, tracks of each individual takes up one cell.
                     Format of each cell is [time x y z; time x y z; ...]
                 times: which times were used for computing ground-truth
  b. Nursing home short: First used in [5]. "datasets/nursing_home_short.mat"
               has similar format with terrace1. 
  c. Nursing home long: 7 hour 45 minute 15 camera sequence. Ground truth
               annotated every minute. "datasets/nursing_home_long.mat" has
               similar format as terrace1.

2. Solution path algorithm: Our proposed method. Code in algorithms/solution_path. 
   NND: Our method proposed in [3]. The optimization method has been slightly modified,
        but essentially they are the same algorithm.
   NMO: Method proposed in [4].

The provided package is built on many existing work. Data/code which were
based on or motivated by the following papers are included in this package
for convenience. Please cite the following papers accordingly.
1. If you use terrace1.mat, please cite [2].
2. If you use nursing_home_short.mat, please cite [3].
3. If you use nursing_home_long.mat, please cite [4].
4. If you use the evaluation code to compute MOTA, please cite [1].
   The actual code is from https://github.com/glisanti/CLEAR-MOT.
   Our script algorithms/utils/get_CLEAR-MOT.sh checks out the code and patches it.

If you have any questions please send mail to iyu@cs.cmu.edu or endoplasmic1357@gmail.com.
Shoou-I Yu, 2016/9/5


[1] A. D. Bagdanov, A. Del Bimbo, F. Dini, G. Lisanti, and
I. Masi. Posterity logging of imagery for video surveillance.
In IEEE Multimedia, 2012.

[2] F. Fleuret, J. Berclaz, R. Lengagne, and P. Fua. Multicamera
People Tracking with a Probabilistic Occupancy Map. In IEEE TPAMI, 2008.

[3] S.-I. Yu, Y. Yang, and A. Hauptmann. Harry potter's Marauder's
Map: Localizing and tracking multiple persons-of-interest
by nonnegative discretization. In CVPR, 2013.

[4] S.-I. Yu, Y. Yang, X. Li, and A. Hauptmann. Long-Term Identity-Aware
Multi-Person Tracking for Surveillance Video Summarization.
arXiv:1604.07468, 2016.