Vision in Words

Automatic linguistic description of objects, people and their interactions in indoor videos