Find more details about this and other tutorials at the ICDAR tutorials page.
The issues of privacy and restricted access to documents have always been core problems of document analysis and recognition (DAR) with important repercussions in the way the community does research. One effect is that many models are still trained on private, undisclosed datasets, while public document analysis datasets tend to be small or focus on very specific narrow domains.
There are important regulatory issues when using contemporary documents, especially in the administrative/fin-tech/insurance-tech domains, that need to be taken into account every time a document analysis application is introduced. Existing privacy protection regulations, like the GDPR in the European Union, impose specific restrictions to the treatment of documents with AI models. For example limitations on the time a document can be kept in storage, or restrictions in how client data can be used for training new models. It is to be expected that as new regulation on Artificial Intelligence is introduced (e.g. the AI Act at the European Union, or the recent AI executive order in the USA), privacy guarantees will become a requirement for many sensitive applications.
The tutorial motivates and explains an important, emerging DAR topic on privacy preserving and collaborative learning methods for document analysis. This line of research is already being defined and advancing fast outside the strict walls of the DAR community, while it presents new opportunities for DAR research.
Our example use case is of direct interest to the DAR community, where a Visual Question-Answering (VQA) model is required to answer questions on invoice documents. Reflecting a real-life scenario, the data used for training is distributed over a number of different clients. Models are required to use a collaborative learning setup, and preserve the identity of the invoice providers used for training.
Learn more about the privacy-preserving document intelligence use case at the ELSA Benchmarks platform.
Overview of the invoicing scenario use case. Suppliers generate invoices for their customers. To process the invoice, the customer applies Document Intelligence technologies. Privacy issues arise in all communications of documents between document users (customers) and AI service providers.
We will begin with a comprehensive explanation of the federated learning (FL) approach to training on distributed data, detailing its specific features and functions, and an analysis of its drawbacks.
To address these issues, we will present a range of solutions, each tailored to overcome specific challenges inherent in the federated approach.
We will introduce attendees to the most important concepts in differentially private machine learning. The starring role is the differentially private stochastic gradient descent algorithm (DP-SGD).
Participants will be acquainted with the required background on privacy accounting and formal DP mechanisms.
We will bring everything together in a hands-on session, applying the lessons so far to the use-case of document analysis. Participants will be guided in the implementation of a basic setting of federated and private learning.
By the end of the session, participants will have trained a private model over a distributed dataset using DP and FL for a practical application.
9:00 | Introduction and Motivation
Dimosthenis Karatzas & Vincent Poulain d'Andecy
9.20 | Federated Learning
Raouf Kerkouche
10.00 | Vulnerabilities of Collaborative Learning
Raouf Kerkouche
11.15 | Privacy in Machine Learning
Joonas Jälkö
11.45 | Differential Privacy Algorithms
Joonas Jälkö
12.15 | Use Case on Private, Federated DocVQA
Mohamed Ali Souibgui, Marlon Tobaben & Khanh Nguyen
2.15 | Hands-on: Federated Learning
Andrey Barsky & Kangsoo Jung
3.00 | Hands-on: Differential Privacy
Andrey Barsky & Kangsoo Jung
The tutorial should interest researchers at the PhD or post-doc level, working in areas where sensitive information within documents is a relevant consideration.
We expect participants to have good working knowledge of machine learning and deep learning techniques, but do not expect any prior experience in federated learning or differential privacy.
Participants who want to join the hands-on exercise are encouraged to bring their laptops to the session.
Dimosthenis Karatzas is an Associate Professor at the Universitat Autònoma de Barcelona and Associate Director of the Computer Vision Centre (CVC) in Barcelona, Spain, where he leads the Vision, Language and Reading research group. He has produced more than 140 publications on computer vision, reading systems and multimodal learning. He received the 2013 IAPR/ICDAR Young Investigator Award, a Google Research Award (2016) and two Amazon Machine Learning Research Awards (2019, 2022). He has set up two spin-off companies to date, TruColour Ltd, UK, in 2007 and AllRead, Spain, in 2019. Between 2018-19 he advised the Catalan government on the Catalan strategy of AI. He is a senior member of IEEE, a fellow of ELLIS and co-director of the ELLIS Unit Barcelona, past chair of IAPR TC11 (Reading Systems), and a member of the Artificial Intelligence Doctoral Academy (AIDA) Research and Industry Board. He created the Robust Reading Competition portal, established as the de-facto international benchmark in document analysis and used by more than 45,000 registered researchers.
Josep Lladós is an Associate Professor at the Computer Sciences Department of the Universitat Autònoma de Barcelona and a staff researcher of the Computer Vision Center, where he is also the director since January 2009. He is chair holder of Knowledge Transfer of the UAB Research Park and Santander Bank. He is the head of the Pattern Recognition and Document Analysis Group (2009SGR-00418). His current research fields are document analysis, structural and syntactic pattern recognition and computer vision. He has been the head of a number of Computer Vision R+D projects and published more than 200 papers in national and international conferences and journals.
Ernest Valveny is an Associate Professor at the Universitat Autònoma de Barcelona and also a researcher at the Computer Vision Center. He was the director of the Computer Science Department at UAB from 2013-2019. He he is a member of the Vision, Language and Reading research unit at CVC. His main research interests are computer vision, in particular text recognition and retrieval, document understanding and multimodal (vision and language) models. He has published more than 20 papers in international indexed journals and more than 100 papers in peer-reviewed international conferences. He has led a number of national and international research projects, as well as technology transfer contracts with companies, mainly related to document analysis and robust reading. He has served as a reviewer and member of the committee program for many of the most relevant international journals and conferences within the area of computer vision and pattern recognition.
Mohamed Ali Souibgui is a postdoctoral researcher at Computer Vision Center, Barcelona, Spain. He received the Ph.D. degree in 2022 from the Universitat Autònoma de Barcelona (UAB), Spain. His research focuses on document image analysis using computer vision and machine learning tools.
Andrey Barsky is a postdoctoral researcher at the Computer Vision Center, Barcelona, Spain. He received his Ph.D. in 2015 from the University of Nottingham in the UK. His research focuses on computer vision and multimodal learning, as well as robustness and explainability in AI models.
Khanh Nguyen is currently a PhD student in the Computer Vision Center, Barcelona, Spain. His research focuses on machine learning methods for Vision-and-Language tasks, particularly exploring the role of context and incorporate it into the image interpretation pipeline.
Antti Honkela is a Professor of Data Science (Machine Learning and AI) at the Department of Computer Science, University of Helsinki. He is the coordinating professor of Research Programme in Privacy-Preserving and Secure AI at the Finnish Center for Artificial Intelligence (FCAI), a flagship of research excellence appointed by the Research Council of Finland, and leader of the Privacy and infrastructures WP in European Lighthouse in Secure and Safe AI (ELSA), a European network of excellence in secure and safe AI. He serves in multiple advisory positions for the Finnish government in the privacy of health data. His research focuses on differentially private machine learning and statistical inference. He is an Action Editor of Transactions on Machine Learning Research and regularly serves as an area chair at leading machine learning conferences (NeurIPS, ICML, ICLR, AISTATS). He has taught the course Trustworthy Machine Learning including topics on privacy-preserving machine learning at the University of Helsinki since 2019.
Joonas Jälkö is a postdoctoral researcher in Professor Antti Honkela's group at the Department of Computer Science in University of Helsinki. His research focuses mainly on differential privacy applied on statistical inference and differentially private synthetic data.
Marlon Tobaben is a PhD student at the Department of Computer Science, University of Helsinki, supervised by Prof Antti Honkela and affiliated with the Finnish Centre of Artificial Intelligence (FCAI), a flagship of research excellence appointed by the Research Council of Finland. Marlon's research focuses on differentially private deep and federated learning.
Mario Fritz is a faculty member at the CISPA Helmholtz Center for Information Security, an honorary professor at Saarland University, and a fellow of the European Laboratory for Learning and Intelligent Systems (ELLIS). Until 2018, he led a research group at the Max Planck Institute for Computer Science. Previously, he was a PostDoc at the International Computer Science Institute (ICSI) and UC Berkeley after receiving his PhD from TU Darmstadt and studying computer science at FAU Erlangen-Nuremberg. His research focuses on trustworthy artificial intelligence, especially at the intersection of information security and machine learning. He is Associate Editor of the journal "IEEE Transactions on Pattern Analysis and Machine Intelligence" (TPAMI) and has published over 100 articles in top conferences and journals. Currently, he is coordinating the Network of Excellence in AI "ELSA -- European Lighthouse on Secure and Safe AI" which is an ELLIS initiative that is funded by the EU and connects universities, research institutes, and industry partners across Europe (https://elsa-ai.eu).
Raouf Kerkouche is a Postdoctoral Fellow at the CISPA Helmholtz Center for Information Security advised by Prof. Mario Fritz. His current research centers around trustworthy machine learning with a focus on privacy and security. Raouf obtained his Ph.D. at INRIA, supervised by Prof. Claude Castelluccia and Prof. Pierre Genevès, where he worked on Differentially Private Federated Learning for Bandwidth and Energy Constrained Environments, with an interest in medical applications. One of his differentially private compression approaches published at UAI’21 has been included in a federated learning platform developed for drug discovery (https://www.melloddy.eu). He obtained his Master's degrees from Paris-Sud University and Pierre and Marie Curie University in France.
Catuscia Palamidessi is Director of Research at INRIA Saclay (since 2002), where she leads the team COMETE. She has been a Full Professor at the University of Genova, Italy (1994-1997) and Penn State University, USA (1998-2002). Palamidessi's research interests include Privacy, Machine Learning, Fairness, Secure Information Flow, Formal Methods, and Concurrency. In 2019 she obtained an ERC advanced grant to conduct research on Privacy and Machine Learning. In 2022, she received the Grand Prix of the French Academy of Science. She has been PC chair of various conferences including LICS and ICALP, and PC member of more than 120 international conferences. She is on the Editorial board of several journals, including the IEEE Transactions in Dependable and Secure Computing, the ACM Transactions on Privacy and Security, Mathematical Structures in Computer Science, Theoretics, the Journal of Logical and Algebraic Methods in Programming, and Acta Informatica. She is serving on the Executive Committee of ACM SIGLOG, CONCUR, and CSL.
Kangsoo Jung is working as a postdoctoral researcher at the COMETE team hosted Inria. He ie working under the supervision of Catuscia Palamidessi. He received the Ph.D. degree in 2017 from Sogang University in South Korea. His research focuses on differential privacy, machine learning and game theory to address the privacy-utility tradeoff.
Vincent Poulain d’Andecy is the head of the Yooz Research and Technologies Department since 2015. He is a graduate engineer of INSA Rennes and PhD of La Rochelle University. He started his career at ITESOFT in 1994 and has more than 25 years of experience in the development of Automatic Document Processing Systems. At Yooz, he is in charge of the AI developments with a 9-persons team, he supervises PhD and collaborative research projects in partnership with Academia like La Rochelle University and the CVC-CERCA.
Aurélie Joseph got her Ph.D. in Linguistics with the support of ITESOFT company and LDI Lab (Paris Sorbonne Cité) in 2013. She has been working as Innovation Lab Manager at Yooz (France) in the large document analysis field (document classification, information extraction, flow structuration mobility, security). Leading a team of 4 engineers, she specifies needs of the company, manages projects but also develops, integrates and tests technologies with the partnership of different labs.
18th International Conference on Document Analysis and Recognition
30 August - 4th September 2024
This work is supported by ELSA - European Lighthouse on Secure and Safe AI funded by the European Union under grant agreement No. 101070617. Views and opinions expressed are those of the authors only and do not necessarily reflect those of the European Union or European Commission. Neither the European Union nor the European Commission can be held responsible for them.
For any questions about this tutorial, please contact: organizers_pfl@cvc.uab.cat