TiE: Text in Everything

at ECCV 2022

Important Notes

Workshop Date - October 24th, 2022, 09:00 - 17:30 ISR time.
Workshop Location - David Intercontinental, Gallery Hall. The Gallery Hall is located in the lobby floor of the hotel (two floors up from where the registration is) to the right of the hotel reception (there will be signs).
The poster session will be held in the main floor of the conference (where the registration took place). Please hang your poster where “Gallery” is written in red.
Best Paper - Task Grouping for Multilingual Text Recognition. Jing Huang (Facebook); Kevin J Liang (Facebook); Rama Kovvuri (Facebook); Tal Hassner (Facebook AI). [Paper]

Overview

Understanding written communication through vision is a key aspect of human civilization and should also be an important capacity of intelligent agents aspiring to function in man-made environments. For example, interpreting written information in natural environments is essential in order to perform most everyday tasks like making a purchase, using public transportation, finding a place in the city, getting an appointment, or checking whether a store is open or not, to mention just a few. As such, the analysis of written communication in images and videos has recently gained an increased interest, as well as significant progress in a variety of text based vision tasks. While in earlier years the main focus of this discipline was on OCR and the ability to read business documents, today this field contains various applications that require going beyond just text recognition, onto additionally reasoning over multiple modalities such as the structure and layout of documents.

Recent advances in this field have been a result of a multi-disciplinary perspective spanning not only computer vision, but also natural language processing, document and layout understanding, knowledge representation and reasoning, data mining, information retrieval, and more. The goal of this workshop is to raise awareness about the aforementioned topics in the broader computer vision community, and gather vision, NLP and other researchers together to drive a new wave of progress by cross pollinating more ideas between text/documents and non-vision related fields.

This workshop aims to bring together relevant speakers and researchers working on various aspects of text understanding in vision applications. This meeting between different (NLP and vision), but still close, research communities will push the frontiers of the field. The workshop will be a full-day event comprising invited talks, oral and poster presentations of submitted papers and a special session for the challenge.

Invited Speakers

Huazhong University

Meta AI

AWS AI Labs

Weizmann Institute

Aishwarya Agrawal

University of Montreal, DeepMind

Sharon Fogel

AWS AI Labs

Submission

Our call for papers include any text dependent vision application, such as text detection, scene text VQA, layout prediction, text in video, tables detection, etc. All papers will be reviewed by at least two reviewers with a double-blind policy. The workshop considers two types of submissions: (i) Long papers, which are limited to 14 pages excluding references and will be included in the official ECCV workshop proceedings; and (ii) Short papers (extended abstracts), which are limited to 4 pages excluding references and will NOT be included in the official ECCV proceedings and therefore does not count as double submission for most vision conferences.

Important dates:

Paper Submission Deadline: ~~August 1, 2022~~ July 22, 2022
Notification to Authors: ~~August~~ 15~~, 2022~~ August 8, 2022
Workshop Camera Ready Due: ~~August~~ 22~~, 2022~~ August 15, 2022
Workshop Date: October 2022

CMT submission website: https://cmt3.research.microsoft.com/TiE2022

More details may be found in our Call for Papers.

Challenge

Recent work revealed that state-of-the-art text recognition methods perform well on images with words within vocabulary, but generalize poorly to outside vocabulary text images. In real-world scenarios out-of-lexicon (OOL) words are common and of great importance, for example, emails, dates, random strings. Unfortunately, existing benchmarks do not contain many of these words, and therefore, current methods are not evaluated on OOL and overweighting their own (implicit and explicit) language model. This workshop proposes The OOL Challenge, presenting an evaluation set of OOL text-in-the-wild images. This challenge attracts vision-language combined methods that are robust to OOL words. We expect this challenge to increase interest in techniques that balance the trade-off between vision and language.

More details may be found in our Challenge page.

To download the dataset and join the challenge, please visit the RRC portal.