Speaker Recognition

(Click here for a list of our research themes.)

Speaker recognition is a method used to determine the identity of the speaker in an input speech signal (i.e., who is speaking). When the technology is used to identify both the speaker and the time of speaking (i.e., who spoke and when), it is referred to as speaker dialization. We are working on enhancing this technology by employing pattern recognition and machine learning techniques. 

Speaker Identification Using Crowdsourcing

Methodologies to improve the accuracy of speaker identification by utilizing crowdsourcing

Our current research focuses on exploring crowdsourcing methodologies to efficiently annotate speech data, enabling easy development of speaker recognition systems and enhancing the accuracy of speaker identification results. To achieve this, we are utilizing Tutti, a framework designed for leveraging crowdsourcing on Amazon Mechanical Turk.

Relevant Publications:

Robust Feature Representation Learning for Speaker Identification

Techniques aimed at disentangling the complex mixture of information about speech content and speaker identity

Speech contains both the content of the speech (i.e., what is being said) and information about the speaker (i.e., who is speaking). Typically, speech recognition technology struggles with variations in speaker identity, while speaker recognition technology struggles with changes in speech content. As a result, we are working to develop a technique that can disentangle the complex mixture of information about speech content and speaker identity, enabling speaker identification even with short utterances. 

Relevant Publications:

Modeling for Speaker Clustering

Relevant Publications: