SIMONe
View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
Rishabh Kabra, Daniel Zoran, Goker Erdogan, Loic Matthey, Antonia Creswell
Matthew Botvinick, Alexander Lerchner, Christopher P. Burgess
View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
Rishabh Kabra, Daniel Zoran, Goker Erdogan, Loic Matthey, Antonia Creswell
Matthew Botvinick, Alexander Lerchner, Christopher P. Burgess
CATER (moving camera)
Objects Room 9
Playroom
Playroom
CATER (moving camera). Note the object occluded (the distant yellow sphere) for some frames in example 2; it is tracked stably by SIMONe. Moreover, SIMONe assigns each object's shadows (up to three due to multiple lights) to the same segment.
Playroom. Number of unique foreground objects across the sequence in each example: 28, 15, and 29.
Object latent attributes
Frame latent attributes