Visualising Visual Words In Videos

Global bag of visual word models (a.k.a bag of features - BoF) are used to achieve state-of-the art classification performance on challenging video data.

Each element in a BoF histogram represents the frequency of occurrence of a particular visual word;

a visual word represents a group of features which are similar by some distance measure in the feature space (usually euclidean).

Therefore BoF is an unstructured representation i.e. a random permutation on the position of visual words will generate the same histogram.

The pattern of visual words in a video is not arbitrary,

however finding a suitable representation to capture the structure of visual words is hard.

This collection of MATLAB tools is intended to help visualise the `visual words' generated for video data during the popular bag of visual words pipeline.

Usage:

1) Add folder to current MATLAB directory and set the path to the toolkit directory: /home/your_folders/visualising_video_visual_words

2) Run "demo.m"

(Tested on MATLAB 2010a + Ubuntu 12.04 64bit)

Download

In the following videos from the KTH dataset,

the trajectory feature components from Heng Wang et al. have been clustered into a 16-visual word vocabulary.

Example 3D plots:

Each circle corresponds to mean position of a space-time feature.

Each colour corresponds to a different `visual word'.

Size of circle corresponds to the scale at which feature was extracted.

Histogram in top left corner shows the frequency of each word on that particular frame (un-normalised). (video)

The black number (top left corner) is the frame number. (video)