Visualising Visual Words In Videos
Global bag of visual word models (a.k.a bag of features - BoF) are used to achieve state-of-the art classification performance on challenging video data.
Each element in a BoF histogram represents the frequency of occurrence of a particular visual word;
a visual word represents a group of features which are similar by some distance measure in the feature space (usually euclidean).
Therefore BoF is an unstructured representation i.e. a random permutation on the position of visual words will generate the same histogram.
The pattern of visual words in a video is not arbitrary,
however finding a suitable representation to capture the structure of visual words is hard.
This collection of MATLAB tools is intended to help visualise the `visual words' generated for video data during the popular bag of visual words pipeline.
Usage:
1) Add folder to current MATLAB directory and set the path to the toolkit directory: /home/your_folders/visualising_video_visual_words
2) Run "demo.m"
(Tested on MATLAB 2010a + Ubuntu 12.04 64bit)
In the following videos from the KTH dataset,
the trajectory feature components from Heng Wang et al. have been clustered into a 16-visual word vocabulary.
Example 3D plots:
Each circle corresponds to mean position of a space-time feature.
Each colour corresponds to a different `visual word'.
Size of circle corresponds to the scale at which feature was extracted.
Histogram in top left corner shows the frequency of each word on that particular frame (un-normalised). (video)
The black number (top left corner) is the frame number. (video)