5/1/2018 Tue lab meeting

Post date: May 2, 2018 10:34:46 AM

We have discussed what to do for CeSLea (digital companion) project.

Text-dependent speaker recognition
- need keyword spotting using wordnet
- background model for rejection - research topic
Text-independent speaker recognition
- i-vector
- ergodic (fully connected) HMM
- RNN / LSTM
- needs variety of training sentences
Speaker segmentation
- finds when the enrolled speakers are talking
- finds when there is speaker change (not enrolled)
Speech separation
- using extra information as Google did recently on Youtube videos
- general source separation
Rejection
- likelihood ratio test (log difference test) - difference between 1st and 2nd speakers, or 1st and average scores
- normalized Z-test: z = (score-mean)/std > threshold
  - 90% CI 1.645
  - 95% CI 1.960
  - 99% CI 2.576
  - 99.5% CI 2.807