Literate models for computer vision:

Combining vision, language and reading

Content

Written information in the world around us is a fundamental cue for a multitude of everyday tasks. From shopping at the supermarket to finding our destination in an unknown urban space, written text helps us perform many tasks that would otherwise be much more complex.

Computer vision systems on the other hand, have been practically illiterate for the first half century of their lifetime. Specific research on reading systems has been going on for decades, but the semantic information that image text conveys was not incorporated to higher-level computer vision tasks until very recently. This is gradually changing, afforded by the great success achieved in the field of scene text recognition in recent years.

Through this short interactive course, doctoral students will have a chance to reconcile with the state of the art in reading systems, especially scene text recognition, and explore how image text enables us to tackle new and exciting computer vision tasks such as fine-grained image classification, cross-modal retrieval, captioning and visual question answering.

Dates

Monday April 4, 2022 - 17:00 - 19:00

Wednesday April 6, 2022 - 17:00 - 19:00

Attendance

The course will take place in hybrid mode. On site attendance is possible for Monday April 4. The course will take place at the "Sala de Graus", Engineering School, Autónomous University of Barcelona (map). The lectures of Wednesday will only be given online. Onsite availability is limited, and previous confirmation is required to attend on site.

The online link will be provided by the organisers after enrollment.

Please indicate your preferred attendance mode during registration. If you have registered but you did not receive instructions yet, please double check the email you provided us and contact us at press@cvc.uab.cat.

Organisers

Dimosthenis Karatzas

Lluis Gomez

Ernest Valveny

Ali Biten

Andres Mafla

Ruben Perez

Sergi Garcia