Research

Speech technology: speaker diarisation, speaker linking, speech emotion recognition, machine learning

Tools

The segment F-measure is a new evaluation technique for speaker diarisation and it's based on segment matches using the F-measure. This gives the user a deeper insight into how well matched the hypothesised segments are to the reference segments.

https://github.com/rosannamilner/segment-f-measure

Data

The speaker diarisation reference for NIST RT07 meeting data has been improved by manually re-segmenting and it is now accurate to within 0.1 seconds and has speech segments with speaker labels for the complete audio files. The reference RTTM files can be downloaded below.

https://mini.dcs.shef.ac.uk/resources/dia-improvedrt07reference/

Publications

[Conference] Jose Antonio Lopez Saenz, Md Asif Jalal, Rosanna Milner, Thomas Hain : "Attention Based Model for Segmental Pronunciation Error Detection" ASRU 2021
[Conference] Md Asif Jalal, Rosanna Milner, Thomas Hain, Roger K. Moore: "Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition." INTERSPEECH 2020
[Conference] Md Asif Jalal, Rosanna Milner, Thomas Hain: "Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition." INTERSPEECH 2020
[Conference] Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain: "A cross-corpus study on speech emotion recognition." ASRU 2019
[Journal] Oscar Saz, Salil Deena, Mortaza Doulaty, Madina Hasan, Bilal Khaliq, Rosanna Milner, Raymond W. M. Ng, Julia Olcoz and Thomas Hain, "Lightly supervised alignment of subtitles on multi-genre broadcasts", Multimedia Tools and Applications, 2018
[Conference] Rosanna Milner, Thomas Hain: "DNN approach to speaker diarisation using speaker channels." ICASSP 2017
[Thesis] Rosanna Milner, "Using deep neural networks for speaker diarisation." University of Sheffield, UK, PhD Thesis, 2016
[Conference] Rosanna Milner, Thomas Hain: "DNN-Based Speaker Clustering for Speaker Diarisation." INTERSPEECH 2016
[Conference] Thomas Hain, Jeremy Christian, Oscar Saz, Salil Deena, Madina Hasan, Raymond W. M. Ng, Rosanna Milner, Mortaza Doulaty, Yulan Liu: "webASR 2 - Improved Cloud Based Speech Technology." INTERSPEECH 2016 - try webASR for yourself!
[Conference] Rosanna Milner, Thomas Hain: "Segment-oriented evaluation of speaker diarisation performance." ICASSP 2016
[Conference] Rosanna Milner, Oscar Saz, Salil Deena, Mortaza Doulaty, Raymond W. M. Ng, Thomas Hain, "The 2015 sheffield system for longitudinal diarisation of broadcast media." ASRU 2015
[Conference] Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, Raymond W. M. Ng, Madina Hasan, Yulan Liu, Thomas Hain: "The 2015 sheffield system for transcription of Multi-Genre Broadcast media." ASRU 2015
[Thesis] Rosanna Milner, "Multi-recording diarisation for BBC broadcasts." University of Sheffield, UK, MSc Thesis, 2012