Pablo Gimeno

Speech scientist at ViVoLab

University of Zaragoza - Aragón Institute for Engineering Research

About me

I was born in Valencia, Spain (1994). I received my Bachelor degree in Telecommunication Engineering in 2016 and my MSc in Telecommunication Enginering in 2018, both from the University of Zaragoza. During the last year of my master I could focus my studies on speech and signal processing and machine learning applied to signal processing.

I started collaborating with ViVoLab research group in 2016 with a government grant for initiation in research activities, and I am currently pursuing my PhD thesis in the same research group under the supervision of Dr. Alfonso Ortega supported with a grant from my regional government (DGA). My research interests span the areas of speech processing, audio and speech segmentation, speech activity detection and automatic speech recognition.

I am also collaborating actively in teaching courses related to signal processing for the Bachelor degree in Telecommunication Engineering (Audio & Image Processing, Signal Processing laboratory) and the Master degree in Telecommunication Engineering (Speech Technologies).

Research

My latest work in Speech Activity Detection was featured in ViVoLab Medium page: check it here. A demo is also available at ViVoLab webpage where you can see and hear some examples.

Check my Google Scholar profile for an updated list of publications

This is a list of my latest publications:

2023

"Improved Cross-Lingual Transfer Learning For Automatic Speech Translation", Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote and James Glass [Paper]

2022

"A Study on the Use of wav2vec Representations for Multiclass Audio Segmentation ", Pablo Gimeno, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at IberSPEECH 2022 [Paper] [Slides]
"Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains", Pablo Gimeno, Dayana Ribas, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, in Applied Sciences [Paper] [Slides]
"Multimodal Diarization Systems by Training Enrollment Models as Identity Representations", Victoria Mingote, Ignacio Viñals, Pablo Gimeno, Antonio Miguel, Alfonso Ortega and Eduardo Lleida, in Applied Sciences [Paper]

2021

"Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021" , Pablo Gimeno, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at Interspeech 2021 [Paper] [Slides] [Presentation]
"Generalising AUC Optimisation to Multiclass Classification for Audio Segmentation with Limited Training Data" , Pablo Gimeno, Victoria Mingote, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, in IEEE Signal Processing Letters [Paper][Poster]
"Automatic Speech Recognition and Language for Specific Purposes: Research Application and Pedagogical Implications" , Miguel A. Vela Tafalla and Pablo Gimeno, at 38 AESLA International Conference [Slides]
"Convolutional Recurrent Neural Networks for Speech Activity Detection in Naturalistic Audio from Apollo Missions" , Pablo Gimeno, Dayana Ribas, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at IberSPEECH 2021 [Paper] [Slides] [Presentation]
"Diarization and Identity Attribution Compatibility in the Albayzin 2020 Challenge", Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at Iberspeech 2021 [Paper]
"ViVoLAB Multimodal Diarization System for RTVE 2020 Challenge", Victoria Mingote, Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at Iberspeech 2021 [Paper]

2020

"Partial AUC Optimisation using Recurrent Neural Networks for Music Detection with Limited Training Data" , Pablo Gimeno, Victoria Mingote, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at Interspeech 2020 [Paper] [Slides] [Presentation]
"Multiclass audio segmentation based on recurrent neural networks for broadcast domain data", Pablo Gimeno, Ignacio Viñals, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, in EURASIP Journal on Audio, Speech, and Music Processing [Paper]

2019

"ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge" , Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at Interspeech 2019 [Paper]
"Phonetically-aware embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation", Ignacio Viñals, Dayana Ribas, Victoria Mingote, Jorge Llombart, Pablo Gimeno, Antonio Miguel, Alfonso Ortega and Eduardo Lleida, at Interspeech 2019 [Paper]

2018

"A Recurrent Neural Network Approach to Audio Segmentation for Broadcast Domain Data", Pablo Gimeno, Ignacio Viñals, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at IberSPEECH 2018 [Paper] [Slides]
"In-domain Adaptation Solutions for the RTVE 2018 Diarization Challenge" - Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at IberSpeech 2018 [Paper]
"Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge" - Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel and Eduardo Lleida, at Interspeech 2018 [Paper]

You can also check my Bachelor Thesis and my Master Thesis (only available in spanish):

MSc Thesis: Automatic audio segmentation based on neural networks models in broadcast environments - under the supervision of Dr. Alfonso Ortega and Ignacio Viñals [Read in spanish]
Bachelor Thesis: Development & evaluation of speech to text alignment tools using automatic speech recognition techniques - under the supervision of Dr. Alfonso Ortega and Julia Olcóz [Read in spanish]