Projects

Test Video from RAIDataset

We pass the image frames from the video through a ResNet-34 trained on the kinetics dataset and pick the logits (512 dims) for every 16 frames and reduce it to 2 dimensions.

We can also compute the l2 norm distance between the consequtuive digits to generate a distance matrix. the notches in the diagonal are the shot boundaries.

TODO -

add PWCNet (optical Flow) and image pyramid.