Invited Talks

Invited Talks

Our schedule includes four confirmed invited speakers and another talk by the workshop organizers.

Xiang Bai

Talk Title: Searching text in the wild

Bio: Xiang Bai received his B.S., M.S., and Ph.D. degrees from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2003, 2005, and 2009, respectively, all in electronics and information engineering. He is currently a Professor with the School of Artificial Intelligence and Automation, HUST. His research interests include object recognition, shape analysis, scene text recognition and intelligent systems.

Tal Hassner

Talk Title: On the robustness of OCR systems and why it matters

Talk Abstract: As AI systems mature, our expectations of them grow. The OCR systems released years (indeed, decades) ago were generally expected to handle few, clean, printed text documents, and were typically run with the users compute resources. Modern OCR systems, especially commercial ones, face far more challenging expectations: They can be used to process millions, if not billions of images a day, on the cloud, the text they process can be, and often is, multilingual, and the images themselves often represent in-the-wild settings, with little of the guarantees which made OCR possible so many years ago. In this talk, I will consider these new expectations from the perspective of AI Robustness: In general terms, AI Robustness means ensuring that an AI system performs consistently, even under changing, suboptimal, or even adversarial conditions. I will explore the frontiers of OCR research, aiming at making OCR systems robust to some of these changing conditions. Finally, I will share some of our work over the past few years, towards robust OCR.

Bio: Tal Hassner received his M.Sc. and Ph.D. degrees in applied mathematics and computer science from the Weizmann Institute of Science in 2002 and 2006, respectively. His past academic roles include Associate Professor at The Open Univ. of Israel and Visiting Research Associate Professor at the Institute for Robotics and Intelligent Systems, Viterbi School of Engineering, University of Southern California. His industry experience includes leading the design of the AWS Rekognition face recognition services released in 2018 and later the development of the face recognition and OCR services at Meta (formally Facebook). He now focuses on research and engineering for Responsible AI. Tal organized two international meetings at Schloss Dagstuhl, both on analysis of ancient texts and manuscripts. He was also a Program Chair (PC) for WACV (2018) and ICCV (2021), and is now the PC for ECCV (2022). Finally, Tal is an associate editor for IEEE TPAMI and TBIOM.

Zhuowen Tu

Talk Title: Text Spotting Transformers

Bio: Zhuowen Tu is a full professor of Cognitive Science and also affiliated with the Department of Computer Science and Engineering, University of California San Diego. Before joining UCSD in 2013 as an assistant professor, he was a faculty member at UCLA. Between 2011 and 2013, he took a leave to work at Microsoft Research Asia. He received his Ph.D. from the Ohio State University and his M.E. from Tsinghua University. He is a recipient of the David Marr Prize award 2003 and a recipient of the David Marr Prize Honorable Mention award 2015. He is a Fellow of the IEEE.

Tali Dekel

Talk Title: Text2LIVE: Text-Guided Layered Image and Video Editing

Bio: Tali just joined the Faculty of Mathematics and Computer Science at the Weizmann Institute of Science as an Assistant Professor. She is also a Research Scientist at Google. Before that she was a Postdoctoral Associate with Prof. Bill Freeman, at CSAIL, MIT. She completed her Ph.D. in the school of Electrical Engineering of Tel-Aviv University, where she was supervised by Prof. Shai Avidan (TAU) and Prof. Yael Moses (IDC). Her main research interests include images and videos analysis, multi-view systems, 3D structure and motion estimation, image synthesize and rendering.

Aishwarya Agrawal

Talk Title: Advancing multimodal vision-language learning

Talk Abstract: Over the last decade, multimodal vision-language (VL) research has seen impressive progress. We can now automatically caption images in natural language, answer natural language questions about images, retrieve images using complex natural language queries and even generate images given natural language descriptions. However, current VL systems lack several skills that prevent them from being practically usable: out-of-distribution generalization, compositional reasoning, common sense and factual knowledge reasoning, data-efficient adaptation to new tasks, interpretability and explainability, overcoming spurious correlations and biases in data, etc. In this talk I will present our work studying two of these challenges in VL research: out-of-distribution generalization in visual question answering, and data-efficient adaptation of VL models to new VL tasks.

Bio: Aishwarya Agrawal is an Assistant Professor in the Department of Computer Science and Operations Research at University of Montreal. She is also a Canada CIFAR AI Chair and a core academic member of Mila -- Quebec AI Institute. She also spends one day a week at DeepMind as a Research Scientist. Aishwarya completed her PhD in Aug 2019 from Georgia Tech, working with Dhruv Batra and Devi Parikh. She used to co-organize the annual VQA challenge and workshop. Her research interests lie at the intersection of computer vision, deep learning and natural language processing. She is, in particular, keen about building vision-language models that generalize to out-of-distribution datasets.

Sharon Fogel

Talk Title: Overcoming the Data Challenge in Text Spotting

Bio: Sharon is a research scientist at Amazon Web Services, working on text recognition and understanding. She received her M.Sc. degree in Electrical Engineering at Tel-Aviv University with Daniel Cohen-Or. Her recent work on reducing the need for supervision in end-to-end text recognition using transformer based model is tightly connected to the workshop theme.