Associate Professor @ DISI, University of Trento, Trento, Italy
Head of the Deep Visual Learning group @ Fondazione Bruno Kessler (FBK), Trento, Italy
Talk title: “The Unreasonable Effectiveness of Large Language-Vision Models for Video Domain Adaptation”
Abstract
Video analysis tasks, such as action recognition, have long been investigated in computer vision. Major progress has been made in the last decade with the development of specialized deep architectures, such as 3D CNNs and Video Transformers [trained on large-scale annotated datasets. However, obtaining sufficient labelled training videos for real-world scenarios can be very costly and time consuming. In order to alleviate the burden of annotating large scale datasets, Video-based Unsupervised Adaptation (VUDA) methods have been introduced. The VUDA methods are derived from the common idea of transferring knowledge from a labelled source domain to an unlabelled target domain. In the last few years, the field of computer vision has also witnessed the emergence of a new generation of powerful deep architectures, trained on mammoth internet-scale image-text datasets. These models, commonly known as foundation models or Large Language Vision Models (LLVMs) have achieved outstanding performance, and have become a cornerstone of modern computer vision research. In this talk I will introduce recent works from my research group which leverage LLVMs for addressing the main challenges of VUDA.
Research Scientist and Project Lead @ NAVER Labs Europe
Grenoble, France
https://ricvolpi.github.io/
Talk title: “Incremental Learning for Semantic Image Segmentation”
References
On the Road to Online Adaptation for Semantic Image Segmentation, CVPR 2022
RaSP: Relation-aware Semantic Prior for Weakly Supervised Incremental Segmentation, CoLLAs 2023
Reliability in Semantic Segmentation: Are We on the Right Track?, CVPR 2023
Talk title: "Efficient construction of training datasets for 2D and 3D data"
Abstract
The construction of training datasets is an onerous yet essential activity that most industrial challenges face when tackling a new perception task with unfamiliar data. Different learning scenarios raise various questions: What data should be annotated to achieve the most significant performance boost? When provided with pre-annotated data for classification, how can one adapt to the task of object detection? In this presentation, we will discuss different active learning methods that help to select the most important data for annotation in different perception scenarios. We will also examine the importance of the initial batch of selected data when performing active learning.
Associate Professor @ University of Amsterdam, Amsterdam, Netherlands
Co-founder @ Ellogon.AI
Talk title: “Causal Computer Vision Towards Interactive Embodied AI”
Abstract
While in the past 30 years AI had mostly focused on static data and tasks, like classifying images of cats and dogs or reconstructing 3D point clouds, in the last years we have witnessed a grand shift towards learning of dynamics and embodiment, where both algorithms and environments change through space and time, and importantly algorithms can change the environment itself. Core this has been works on Embodied AI, Neural Simulations, and Causal Representation Learning. With the seismic shift from monolithic AI models to foundational AI systems comprising multiple modalities, in this talk I will discuss the next generation of computer vision algorithms, which do not simply observe but, furthermore, intervene and influence.