STOMA: Towards real-time, enhanced text-to-speech synthesis on the device (PI, 2022-2024)
ELIDEK

Text-to-Speech (TTS) Synthesis facilitates to ease the human-machine communication by converting arbitrary text to informative and natural speech signals. Recent TTS systems based on neural network models constitute a major breakthrough in terms of quality, naturalness and expressivity of synthetic speech. Unfortunately, neural-based TTS systems have millions of trainable parameters demanding not only expensive computational resources but also large collections of high-quality recordings. Subsequently, the development and deployment of product quality speech synthesis systems is laborious and costly and, at the same time, the generation of synthetic speech (i.e., the inference process) is slow.

STOMA fundamentally targets on devising and implementing novel and lighter TTS synthesis systems whose parametric space will be significantly smaller than the existing without sacrificing the speech quality and naturalness. Lighter TTS systems enable for real-time speech production on the device due to the need for fewer computational operations inside the neural network. Additionally, STOMA takes into consideration the vibrating and quasi-periodic nature of speech as well as the spectro-temporal analysis of human's auditory system and it will be robustly adapted to intelligibility-enhanced synthesis systems through transfer learning.

Characterising population dynamics with applications in biological data (PI, 2020-2021)
ESPA - Department of Development

All scientific fields and especially biology face the principal challenge of understanding how the observed complex dymanical systems are organized. Modern technology provides an ever-growing amount of data that manage to capture most of the heterogeneity of the systems we observe. Concurrently, developments in high-throughput computing provide the means for efficient processing of massive amounts of data. Therefore, a major paradigm shift would be the development of algorithms able to reconstruct a complex system’s structure de-novo from observational data and describe its dynamics on a population level.

On such grounds, this proposal aspires to reach a deeper understanding of complex systems via modelling and simulation of their dynamics. The grand challenge that it sets is to develop a new computational methodology for the efficient analysis of data from populations of measurements and to make it readily available to the research community as open software. The devised methodologies will be tested and validated on well-defined biological problems such as the understanding of the molecular mechanisms underlying immune cell differentiation.

ENRICH: Enriched communication across the lifespan (Partner, 2017-2020)
Horizon 2020, MSCA-ETN-2020

Speech is a hugely efficient means of communication: a reduced capacity in listening or speaking creates a significant barrier to social inclusion at all points through the lifespan, in education, work and at home. Hearing aids and speech synthesis can help address this reduced capacity but their use imposes greater listener effort. The fundamental objective of the ENRICH network is to modify or augment speech with additional information to make it easier to process. ENRICH will address the unmet need for multi-skilled practitioners and engineers in this rapidly growing sector currently facing a serious workforce shortage.

Enrichment aims to reduce the listening burden by minimising cognitive load, while maintaining or improving intelligibility. ENRICH will investigate the relationship between cognitive effort and different forms of natural and synthetic speech. Non-intrusive metrics for listening effort will be developed and used to design modification techniques which result in low-burden speech. The value of various enrichment approaches will be evaluated with individuals and cohorts with typically sub-optimal communication ability, e.g., children, hearing-impaired adults, non-native listeners and individuals engaged in simultaneous tasks.