The NTU Video-Object-Instance (NTU-VOI) dataset is provided for the evaluation of object instance search and localization in large scale videos.
It consists of 146 ground truth video clips with bounding box annotations of object instances in each frame. The total download size of the videos is ~270MB.
