5/1/2018 Tue lab meeting

Post date: May 2, 2018 10:34:46 AM

We have discussed what to do for CeSLea (digital companion) project.

  • Text-dependent speaker recognition

    • need keyword spotting using wordnet

    • background model for rejection - research topic

  • Text-independent speaker recognition

    • i-vector

    • ergodic (fully connected) HMM

    • RNN / LSTM

    • needs variety of training sentences

  • Speaker segmentation

    • finds when the enrolled speakers are talking

    • finds when there is speaker change (not enrolled)

  • Speech separation

    • using extra information as Google did recently on Youtube videos

    • general source separation

  • Rejection

    • likelihood ratio test (log difference test) - difference between 1st and 2nd speakers, or 1st and average scores

    • normalized Z-test: z = (score-mean)/std > threshold

      • 90% CI 1.645

      • 95% CI 1.960

      • 99% CI 2.576

      • 99.5% CI 2.807