The focus of this lab is on various problems related to visual understanding. These include recognition (detection, categorisation and retrieval), biometric and behavioural analysis (face, gesture and body pose), low-level vision, image and video synthesis, vision+language tasks (image captioning, visual question answering and cross-modal retrieval), segmentation, shape analysis, and 3D from multi view and sensors. These problems are addressed in a data-driven manner using various machine learning techniques (both by adapting the existing ones as well as proposing new ones), and are studied in the context of different domains, such as scanned documents, architectural layout plans, natural scenes, activity videos, etc.