VINALDO: Machine vision and NLP for Document Analysis  


                

2st International Workshop, in conjunction with ICDAR 2024, from August 30 to September 4, 2024 Athens, Greece


Context

Document understanding is an essential task in various applications areas such as data invoice extraction, subject review, medical prescription analysis, etc., and holds significant commercial potential. Several approaches are proposed in the literature, but datasets' availability and data privacy challenge it. Considering the problem of information extraction from documents, different aspects must be taken into account, such as (1) document classification, (2) text localization, (3) OCR (Optical Character Recognition), (4) table extraction, and (5) key information detection. 

In this context, machine vision and, more precisely, deep learning models for image processing are attractive methods. In fact, several models for document analysis were developed for text box detection, text extraction, table extraction, etc. Different kinds of deep learning approaches, such as GNN, are used to tackle these tasks. On the other hand, the extracted text from documents can be represented using different embeddings based on recent NLP approaches such as Transformers. Also, understanding spatial relationships is critical for text document extraction results for some applications such as invoice analysis.  Thus, the aim is to capture the structural connections between keywords (invoice number, date, amounts) and the main value (the desired information). An effective approach requires a combination of visual (spatial) and textual information.

Objective 

After the success of VINALDO 2023, in the second edition of VINALDO workshop, we encourage the description of novel problems or applications for document analysis in the area of information retrieval that has emerged in recent years. On the other hand, we want to highlight a particular topic namely “Multi-view and Multimodal approaches”. In fact, the VINALDO workshop aims to combine visual and textual information for document analysis, in this context, multi-view and multimodal methods have really an important advantage in dealing with different types of data. Thus, we encourage works that combine machine vision and NLP through Multiview or/and multimodal approaches.  Finally, we also encourage works that combine NLP and computer vision methods and develop new document datasets for novel applications.

The VINALDO workshop aims to bring together an area for experts from industry, science, and academia to exchange ideas and discuss ongoing research in Computer Vision and NLP for scanned document analysis.