Additional short research projects on computer vision and machine learning for robotics, mobile phones and other embedded systems can be seen in Supervised Student Thesis Projects
Additional short research projects on computer vision and machine learning for robotics, mobile phones and other embedded systems can be seen in Supervised Student Thesis Projects
We have developed efficient transformer-based architectures to process event camera data for activity recognition (EventTransformer and EventTransformer+)
Besides, as part of a broader study centered on sleep, this project investigates the suitability of event cameras (project EVENTSLEEP), to analyze in a non-invasive manner specific behaviors that occur while sleeping and lead to sleeping disorders.
We are interested on increasing drones and swarms autonomy to perform more complex tasks, in particular related to cinematography (around the work in CineMPC and CineTransfer, we have proposed different approaches to make drones more autonomous when filming) and to drone show generation (Gen-Swarms, Adapting Deep Generative Models to Swarms of Drones).
Deep Learning models to learn relevant concepts or features from endoscopic images that can facilitate 3D mapping from them. Large part of this work has been developed under the ENDOMAPPER project, focused on colonoscopy data. We are also developing new methods to improve automated bronchoscopy assistance tools.
We have developed novel deep learning models for different recognition tasks to analyze people behaviours and activities in former project FILOVI, Currently, we are exploring novel VLMs based efficient strategies to improve long video understanding techniques (FALCONeye).
Semantic segmentation in a new hyperspectral dataset captured on a realistic waste sorting facility scenario
Deep Learning models for recognition tasks in different domains using multi-camera systems and heterogeneous sensors, with a particular focus on data and computational requirements efficiency
Deep Learning models for semantic segmentation in different domains, focusing on lack of dense training data and on the use of multi-modal data.
Learning visual models from audio-visual human robot interaction in assistive settings. CHIST-ERA project "Interactive Grounded Language Understanding" (IGLU).
A novel efficient interaction paradigm that approximates any per-pixel magnitude from a few user strokes by propagating the sparse user input to each pixel of the image. This can be used in many image filters.
Social media provides large amounts of visual data, generating new challenges for computer vision, as well as new opportunities and applications.
We have built a wearable catadioptric vision system and design algorithms for its use on semantic mapping and navigation assistance systems.
We have built a dataset with several wearable cameras recording simultaneously daily office activities and evaluated several alternatives for activity recognition with it.
Efficient Place Recognition. Gist based description for panoramas. Hierarchical Visual Localization.
Object Recognition and semantic analysis for robotics and semantic mapping tasks. Scene Understanding.
Structure From Motion and Localization, Robust Matching, Dominant planes detection and Segmentation.