Projects
Test Video from RAIDataset
We pass the image frames from the video through a ResNet-34 trained on the kinetics dataset and pick the logits (512 dims) for every 16 frames and reduce it to 2 dimensions.
We can also compute the l2 norm distance between the consequtuive digits to generate a distance matrix. the notches in the diagonal are the shot boundaries.
TODO -
add PWCNet (optical Flow) and image pyramid.