Speaker recognition (voice biometrics) has gained a lot of interests in a large number of e-commerce related applications, such as speaker forensics, banking transactions and of late in smart phones. Discussion related to the development of speaker recognition systems which are robust to spoofing, noise, channel variability, intrinsic variability etc., is the major goal of S4P 2019. In addition, the recent research work related to the development of speaker diarization and countermeasures against spoofing and tampering attacks will also be covered in this summer school. This summer school will be highly benefited by invited talks from eminent and world-class researchers from academia, industry and research laboratories from India and abroad. In addition, S4P 2019 is highly benefited by world renowned program committee (PC) members. Furthermore, to encourage young talent, the school presents a fourth edition of 5 Minutes Ph.D Thesis Contest (5MPT) with four cash prizes (ISCA endorsed). Finally, for the first time S4P2019 introduces industry perspective talks in major speech technology of industries.
Understanding and modeling phenomenon of speech production is an ongoing endeavor. In particular, what it takes to throw tons of mass on the moon, modeling true speech production mechanism is much more sophisticated and challenging. Hence, there is no formal theory that explains all the configurations that are involved in producing an intelligible speech. However, we can get still much insight into speech mechanism with simplified analog and discrete models.
Speech chain involves various levels for speaker and listener, such as linguistic, physiological and acoustics. The sophistication in the production of speech involves mysterious coarticulation at both local and global-level, nonlinear source-filter interaction, variability in glottal source spectrum w.r.t. gender, articulatory motion, tongue model, large speech bandwidth (20 Hz - 7 kHz), high degree of redundancy in speech, significance of cross-modes in spectral tilt in sound wave propagation around (4-5 kHz ), changes in spectral tilt due to dynamic interaction of pitch (F0) source harmonics with vocal tract spectrum, many-to-one mapping between articulatory position and acoustic waveform, dynamic nature of turbulence created at the excitation source and vocal tract system, highly complex vocal fold vibration, sound diffraction around head and lip radiation, etc. The inability in understanding these sophistications poses a significant scientific challenge to develop high-performance potential speech technologies, such as speech and speaker recognition, voice conversion, speech synthesis, emotion recognition, voice pathology classification, etc.
The aim of this summer school is to further advance our understanding of these challenging (and open) research issues. In this context, this event will be highly benefitted by invited talks from eminent speech processing researchers from academia, industry and R&D labs across the world. In addition, S4P 2018 is benefitted by world-renowned scientific committee. Furthermore, to encourage young talent, the school presents next edition of 5 Minute Ph.D. Thesis (5MPT) contest with three cash prizes (ISCA Supported) and present a special session on Doctoral Consortium Talks by four ISCA student members (in addition to few alumni) of Speech Research Lab at DA-IICT.
Speaker recognition (voice biometrics) has gained a lot of interests in a large number of e-commerce and forensics related applications such as automatic access through voice profile, filtering of telephone calls. Discussion related to the development of speaker recognition systems which are robust to noise, channel variability, intrinsic variability (due to speakers, health issues, stress, etc.) is the major goal of S4P 2017. In addition, the recent research work related to the development of countermeasures against spoofing and tampering attacks will also be covered in this summer school. Language Identification (LID) with the major focus on dialect and accent identification will also be explored. This summer school will be highly benefited by invited talks from eminent speech processing researchers from academia, industry and research laboratories from India and abroad. In addition, S4P 2017 is highly benefited by world renowned program committee members. Furthermore, to encourage young talent, the school presents a special session on doctoral consortium talks by four ISCA student members of DA-IICT.
Speech is most powerful and natural form of communication between the humans. Human speech is produced by the nonlinear coupling of excitation source with vocal tract system. Speech can be made up of phonemes, syllables, words, phrases and finally, sentences. Speech carries various levels of information such as speaker identity, linguistic message, gender, health condition of speaker, personality, and the acoustic environment of recording, etc. The major focus of this summer workshop is to increase awareness of the present and future studies related to representation and modeling of speech signal and finally, using this knowledge for several speech technology applications such as speech and speaker recognition, speech synthesis, audio search, voice conversion, etc. Another salient feature of this summer school is doctoral consortia talks where four doctral students will deliver invited talks.
The speech signal is a result of filtering the excitation source signal with vocal tract system. The primary source of the excitation signal is produced by the vibration of vocal folds at the glottis. Many areas of speech processing techniques are based on this source-filter model. Speech excitation source is known to be the origin for several essential acoustic cues used in speech processing, such as fundamental frequency (pitch), glottal closure instant (epoch), prosodic features, turbulent flow, voice quality, speaking style, and speaker identity, which all contribute to the naturalness and expressiveness of speech. Pitch tracking, detection of epochs, estimation and modeling of glottal flow from a speech signal is very intricate due to nonlinear interactions of the source and system. In addition, estimation of source-based features is very challenging in noisy environments, emotional and singing voice. Most of the time, knowledge of source information is not effectively incorporated which results in the poor performance in applications such as speech synthesis, expressive speech processing, speaker recognition, voice conversion and voice-based biomedical engineering. Hence, the research to effectively incorporate source information will continue to grow which may lead to the better performance in various speech processing applications. This summer school is a step to fill this gap (first of its kind in India) and it will be highly benefited by talks from eminent speech processing researchers from academia, industry and research laboratories from India and abroad.
Human speech production and perception systems are two key components of speech communication chain. The production based speech models aim at describing the encoding of different linguistic and paralinguistic information in speech signal. On the other hand, the perception-based speech models target decoding different streams of information from the speech signal. Several evidences claim that the speech perception is closely linked with the speech production both being part of an integrated communication system. With the advent of new modalities of acquiring speech production/perception data, it is now possible to answer, in a principled way, the insights about these links and develop new engineering models for speech analysis and processing. WiSSAP 2015 highlighted the latest data acquisition, multi-modal analysis, representation and potential of these new approaches to speech modeling.
Speech is a fundamental human activity. It is central to the organization and the development of human health, personality and intelligence. Speech signal processing is a scientific discipline as well as a technology frontier with immense applications such as speech, speaker and language recognition, speech synthesis, speech enhancement and speech coding. On one side are the speech and language sciences (such as linguistics, phonetics, psycho-acoustics) whereas, on the other side are signal processing theory, linear algebra, pattern recognition, artificial intelligence, etc., leading to enhanced human-human and human-machine communication systems. Such wide ranging research and development demands broad base of fundamental knowledge in this area. With an aim to have a deep insight into a few of the applications of speech processing, this workshop will focus on Text-to-Speech (TTS) synthesis. The workshop will cover various approaches for speech synthesis, viz., unit-selection synthesis (USS), statistical parametric synthesis (viz., HMM-Based Speech Synthesis), articulatory speech synthesis. The workshop will be highly benefitted by invited talks by eminent speech processing researchers from academia, industry and research laboratories.
Speech signal processing refers to acquisition, transformation, manipulation, storage, transfer and output of vocal utterances by the machines. Speech is a very common modality for the communication of various technologies and applications such as speech and speaker recognition, language technologies, speech synthesis, speech enhancement, speech coding, etc. On one side of the spectrum are the speech and language sciences, such as linguistics, phonetics, psycho- acoustics, and on the other side are signal processing theory, linear algebra, pattern recognition, artificial intelligence, etc., leading to enhanced human-human and human-machine communication systems. With an aim to have a deep insight into few of the applications of speech processing, this CEP (Continuing Education Program) workshop focuses on general speech signal processing using source and spectral features, applications like speech analysis, speaker and speech recognition, text-to-speech synthesis system, etc. Various aspects of speech prosody will be discussed. Further talks on compressive sensing and nonlinear speech processing will add to the knowledge of various speech-related applications