Events

Recent Events

The Summer School on Speech Signal Processing (S4P) 2019

"Speaker Recognition and Diarization"

6th - 10th July, 2019DA-IICT, Gandhinagar

Speaker recognition (voice biometrics) has gained a lot of interests in a large number of e-commerce related applications, such as speaker forensics, banking transactions and of late in smart phones. Discussion related to the development of speaker recognition systems which are robust to spoofing, noise, channel variability, intrinsic variability etc., is the major goal of S4P 2019. In addition, the recent research work related to the development of speaker diarization and countermeasures against spoofing and tampering attacks will also be covered in this summer school. This summer school will be highly benefited by invited talks from eminent and world-class researchers from academia, industry and research laboratories from India and abroad. In addition, S4P 2019 is highly benefited by world renowned program committee (PC) members. Furthermore, to encourage young talent, the school presents a fourth edition of 5 Minutes Ph.D Thesis Contest (5MPT) with four cash prizes (ISCA endorsed). Finally, for the first time S4P2019 introduces industry perspective talks in major speech technology of industries.

Poster

Closure Report

Past Events

The Summer School on Speech Signal Processing (S4P) 2018

"Speech Production"

9th - 11th September, 2018DA-IICT, Gandhinagar

Understanding and modeling phenomenon of speech production is an ongoing endeavor. In particular, what it takes to throw tons of mass on the moon, modeling true speech production mechanism is much more sophisticated and challenging. Hence, there is no formal theory that explains all the configurations that are involved in producing an intelligible speech. However, we can get still much insight into speech mechanism with simplified analog and discrete models.

Speech chain involves various levels for speaker and listener, such as linguistic, physiological and acoustics. The sophistication in the production of speech involves mysterious coarticulation at both local and global-level, nonlinear source-filter interaction, variability in glottal source spectrum w.r.t. gender, articulatory motion, tongue model, large speech bandwidth (20 Hz - 7 kHz), high degree of redundancy in speech, significance of cross-modes in spectral tilt in sound wave propagation around (4-5 kHz ), changes in spectral tilt due to dynamic interaction of pitch (F₀) source harmonics with vocal tract spectrum, many-to-one mapping between articulatory position and acoustic waveform, dynamic nature of turbulence created at the excitation source and vocal tract system, highly complex vocal fold vibration, sound diffraction around head and lip radiation, etc. The inability in understanding these sophistications poses a significant scientific challenge to develop high-performance potential speech technologies, such as speech and speaker recognition, voice conversion, speech synthesis, emotion recognition, voice pathology classification, etc.

The aim of this summer school is to further advance our understanding of these challenging (and open) research issues. In this context, this event will be highly benefitted by invited talks from eminent speech processing researchers from academia, industry and R&D labs across the world. In addition, S4P 2018 is benefitted by world-renowned scientific committee. Furthermore, to encourage young talent, the school presents next edition of 5 Minute Ph.D. Thesis (5MPT) contest with three cash prizes (ISCA Supported) and present a special session on Doctoral Consortium Talks by four ISCA student members (in addition to few alumni) of Speech Research Lab at DA-IICT.

Poster

Closure Report

The Summer School on Speech Signal Processing (S4P) 2017

"Speaker and Language Recognition"

8th - 12th July, 2017DA-IICT, Gandhinagar

Speaker recognition (voice biometrics) has gained a lot of interests in a large number of e-commerce and forensics related applications such as automatic access through voice profile, filtering of telephone calls. Discussion related to the development of speaker recognition systems which are robust to noise, channel variability, intrinsic variability (due to speakers, health issues, stress, etc.) is the major goal of S4P 2017. In addition, the recent research work related to the development of countermeasures against spoofing and tampering attacks will also be covered in this summer school. Language Identification (LID) with the major focus on dialect and accent identification will also be explored. This summer school will be highly benefited by invited talks from eminent speech processing researchers from academia, industry and research laboratories from India and abroad. In addition, S4P 2017 is highly benefited by world renowned program committee members. Furthermore, to encourage young talent, the school presents a special session on doctoral consortium talks by four ISCA student members of DA-IICT.

Poster

Closure Report

The Summer School on Advances in Speech and Audio Processing (ASAP) 2016

"From Representations and Models to Applications"

19th - 20th July, 2016DA-IICT, Gandhinagar

Speech is most powerful and natural form of communication between the humans. Human speech is produced by the nonlinear coupling of excitation source with vocal tract system. Speech can be made up of phonemes, syllables, words, phrases and finally, sentences. Speech carries various levels of information such as speaker identity, linguistic message, gender, health condition of speaker, personality, and the acoustic environment of recording, etc. The major focus of this summer workshop is to increase awareness of the present and future studies related to representation and modeling of speech signal and finally, using this knowledge for several speech technology applications such as speech and speaker recognition, speech synthesis, audio search, voice conversion, etc. Another salient feature of this summer school is doctoral consortia talks where four doctral students will deliver invited talks.

Poster

Closure Report

The Summer School on Speech Signal Processing (S4P) 2016

"Speech Source Modeling and its Applications"

4th - 8th July, 2016DA-IICT, Gandhinagar

The speech signal is a result of filtering the excitation source signal with vocal tract system. The primary source of the excitation signal is produced by the vibration of vocal folds at the glottis. Many areas of speech processing techniques are based on this source-filter model. Speech excitation source is known to be the origin for several essential acoustic cues used in speech processing, such as fundamental frequency (pitch), glottal closure instant (epoch), prosodic features, turbulent flow, voice quality, speaking style, and speaker identity, which all contribute to the naturalness and expressiveness of speech. Pitch tracking, detection of epochs, estimation and modeling of glottal flow from a speech signal is very intricate due to nonlinear interactions of the source and system. In addition, estimation of source-based features is very challenging in noisy environments, emotional and singing voice. Most of the time, knowledge of source information is not effectively incorporated which results in the poor performance in applications such as speech synthesis, expressive speech processing, speaker recognition, voice conversion and voice-based biomedical engineering. Hence, the research to effectively incorporate source information will continue to grow which may lead to the better performance in various speech processing applications. This summer school is a step to fill this gap (first of its kind in India) and it will be highly benefited by talks from eminent speech processing researchers from academia, industry and research laboratories from India and abroad.

Poster

Closure Report

Winter School on Speech hand Audio Processing (WiSSAP) 2015

"Production-Perception based New Models of Speech Analysis"

4th - 7th January, 2015DA-IICT, Gandhinagar

Human speech production and perception systems are two key components of speech communication chain. The production based speech models aim at describing the encoding of different linguistic and paralinguistic information in speech signal. On the other hand, the perception-based speech models target decoding different streams of information from the speech signal. Several evidences claim that the speech perception is closely linked with the speech production both being part of an integrated communication system. With the advent of new modalities of acquiring speech production/perception data, it is now possible to answer, in a principled way, the insights about these links and develop new engineering models for speech analysis and processing. WiSSAP 2015 highlighted the latest data acquisition, multi-modal analysis, representation and potential of these new approaches to speech modeling.

Poster

Closure Report

Continuity Education Program (CEP) Workshop

"Text-To-Speech Synthesis : Production-Perception based New Models of Speech Analysis"

16th - 18th June, 2014DA-IICT, Gandhinagar

Speech is a fundamental human activity. It is central to the organization and the development of human health, personality and intelligence. Speech signal processing is a scientific discipline as well as a technology frontier with immense applications such as speech, speaker and language recognition, speech synthesis, speech enhancement and speech coding. On one side are the speech and language sciences (such as linguistics, phonetics, psycho-acoustics) whereas, on the other side are signal processing theory, linear algebra, pattern recognition, artificial intelligence, etc., leading to enhanced human-human and human-machine communication systems. Such wide ranging research and development demands broad base of fundamental knowledge in this area. With an aim to have a deep insight into a few of the applications of speech processing, this workshop will focus on Text-to-Speech (TTS) synthesis. The workshop will cover various approaches for speech synthesis, viz., unit-selection synthesis (USS), statistical parametric synthesis (viz., HMM-Based Speech Synthesis), articulatory speech synthesis. The workshop will be highly benefitted by invited talks by eminent speech processing researchers from academia, industry and research laboratories.

Poster

DeitY Project Review Meeting

"Development of Text-to-Speech Synthesis Systems for Indian Languages (Phase - II)"

14th - 15th June, 2014DA-IICT, Gandhinagar

Poster

Continuity Education Program (CEP) Workshop

"Speech Signal Processing and its Applications"

14th October, 2013DA-IICT, Gandhinagar

Speech signal processing refers to acquisition, transformation, manipulation, storage, transfer and output of vocal utterances by the machines. Speech is a very common modality for the communication of various technologies and applications such as speech and speaker recognition, language technologies, speech synthesis, speech enhancement, speech coding, etc. On one side of the spectrum are the speech and language sciences, such as linguistics, phonetics, psycho- acoustics, and on the other side are signal processing theory, linear algebra, pattern recognition, artificial intelligence, etc., leading to enhanced human-human and human-machine communication systems. With an aim to have a deep insight into few of the applications of speech processing, this CEP (Continuing Education Program) workshop focuses on general speech signal processing using source and spectral features, applications like speech analysis, speaker and speech recognition, text-to-speech synthesis system, etc. Various aspects of speech prosody will be discussed. Further talks on compressive sensing and nonlinear speech processing will add to the knowledge of various speech-related applications

Poster

Workshop on Phonetic Engine Project

"Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages"

12th - 13th October, 2013DA-IICT, Gandhinagar

Poster

Invited Talks

Dr. Sunayana Sitaram

Ph.D. CMU, USA and presently, Post-Doc. at Microsoft Research, Bangalore

Deep Learning for Speech Recognition

1st Aug, 2017

Articulatory Features for Synthesis of Low Resource Languages

4th Jan, 2016

Indic Front-end for the Festvox voice building tools

13th Jan, 2015

Prof. (Dr.) Vinay Kumar Mittal

IIIT Chittoor, Sri City, AP

Nonverbal Speech Sounds: Analysis and Applications

12th May, 2015

Prof. (Dr.) Hynek Hermansky

Department of Electrical and Computer Engineering, Johns Hopkins University

Tandem Features for Automatic Speech Recognition

8th Jan, 2015

Prof. (Dr.) Shrikant Narayan

University of Southern California, Los Angeles,USA

Behavioral Signal Processing

4th Jan, 2015

Prof. (Dr.) Shihab Shamma

Department of Electrical and Computer Engineering, University of Maryland

Signal Compression in the Nerve

8th Jan, 2015

Dr. Ananthakrishna Chintanpalli

BITS Pilani, Pilani Campus

Factors Affecting the Cues for Concurrent Vowel Identification: Vowel Level, Age, and Hearing Loss

8th Jan, 2015

Prof. (Dr.) Sanjeev Khudanpur

John Hopkins University, Baltimore, MD, USA

Automatic Speech Recognition and Keyword Spotting

19th Dec, 2014

Google Sites

Report abuse

Events

Recent Events

The Summer School on Speech Signal Processing (S4P) 2019

"Speaker Recognition and Diarization"

Past Events

The Summer School on Speech Signal Processing (S4P) 2018

"Speech Production"

The Summer School on Speech Signal Processing (S4P) 2017

"Speaker and Language Recognition"

The Summer School on Advances in Speech and Audio Processing (ASAP) 2016

"From Representations and Models to Applications"

The Summer School on Speech Signal Processing (S4P) 2016

"Speech Source Modeling and its Applications"

Winter School on Speech hand Audio Processing (WiSSAP) 2015

"Production-Perception based New Models of Speech Analysis"

Continuity Education Program (CEP) Workshop

"Text-To-Speech Synthesis : Production-Perception based New Models of Speech Analysis"

DeitY Project Review Meeting

"Development of Text-to-Speech Synthesis Systems for Indian Languages (Phase - II)"

Continuity Education Program (CEP) Workshop

"Speech Signal Processing and its Applications"

Workshop on Phonetic Engine Project

"Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages"

Invited Talks

Dr. Sunayana Sitaram

Deep Learning for Speech Recognition

Articulatory Features for Synthesis of Low Resource Languages

Indic Front-end for the Festvox voice building tools

Prof. (Dr.) Vinay Kumar Mittal

Nonverbal Speech Sounds: Analysis and Applications

Prof. (Dr.) Hynek Hermansky

Tandem Features for Automatic Speech Recognition

Prof. (Dr.) Shrikant Narayan

Behavioral Signal Processing

Prof. (Dr.) Shihab Shamma

Signal Compression in the Nerve

Dr. Ananthakrishna Chintanpalli

Factors Affecting the Cues for Concurrent Vowel Identification: Vowel Level, Age, and Hearing Loss

Prof. (Dr.) Sanjeev Khudanpur

Automatic Speech Recognition and Keyword Spotting

Contact Us