Universitat Oberta de Catalunya

I am a Professor at Universitat Oberta de Catalunya (UOC) and a member of the Artificial Intelligence for Human Well-being (AIWELL) group at UOC eHealth Research Center.

My research interests are related to Computer Vision, Natural Language Processing, and Affective Computing. I am specially interested in computer vision tasks related with segmentation such as semantic segmentation, instance segmentation and video object segmentation. I am also interested in the application of these research fields to Health.

I have been collaborating with UPC Image Processing Group since 2010 when I got my BS degree in Telecommunication Engineering at Universitat Politecnica de Catalunya. From 2012 to 2016 I also did my PhD in Computer Science in the same research group. Later, in 2017, I also was a visiting professor at ETH Zurich Computer Vision Lab, where I worked on image segmentation with Dr. Jordi Pont-Tuset.

To check my publications please visit my Google Scholar profile.


Carles Ventura

Universitat Oberta de Catalunya

Estudis d'Informàtica, Multimèdia i Telecomunicació

Rambla del Poblenou, 156, 08018 Barcelona (Spain)


Predicting the Subjective Responses’ Emotion in Dialogues with Multi-Task Learning (IbPRIA 2023)

In this work, given a piece of a dialog, we addressed the problem of predicting the subjective emotional response of the upcoming utterances (i.e. the emotion that will be expressed by the next speaker when the speaker talks). For that, we also take into account, as input, the personality trait of the next speaker. (PDF)

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation (MTAP 2023)

The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers. We propose a new architecture (RefVOS) to address this task and we show that the major challenges are related to understanding motion and static actions. (PDF)

Recognizing Emotions evoked by Movies using Multitask Learning (ACII 2021, Sensors 2022)

In this work we address the subjectivity problem in the task of the recognition of the emotions evoked by videos. Instead of modeling the aggregated value, we jointly model the emotions experienced by each viewer and the aggregated value using a multi-task learning approach (PDF). A journal extension can also be found here.

RVOS: End-To-End Recurrent Network for Video Object Segmentation (CVPR 2019)

In this work we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. (PDF)

Iterative Deep Retinal Topology Extraction (MICCAI Wokshops 2018)

This paper tackles the task of estimating the topology of filamentary networks such as retinal vessels. Using a previous proposed CNN model that predicts the local connectivity among the central pixel of an input patch and its border points, we perform a qualitative and quantitative evaluation on retinal veins and arteries topology extraction on DRIVE dataset. (PDF)

Iterative Deep Learning for Road Topology Extraction (BMVC 2018)

This paper tackles the task of estimating the topology of road networks from aerial images. We design a CNN that predicts the local connectivity among the central pixel of an input patch and its border points. By iterating this local connectivity we sweep the whole image and infer the global topology of the road network. (PDF)

Interpreting CNN Models for Apparent Personality Trait Regression  (CVPR Workshops 2017)

This paper addresses the problem of automatically inferring personality traits of people talking to a camera. It presents a deep study on understanding why CNN models are performing surprisingly well by using techniques on CNN model interpretability, combined with face detection and Action Unit (AUs) recognition systems. (PDF)