05/24/2018 Thr Lab Meeting

Post date: May 29, 2018 5:19:40 PM

Attendees: Zahra, 진석, 지호

  • Database

    • TIMIT of 430 speakers

    • Ignore Train/Test/DR slits

    • Do cross-validation: 2 test / 1 val / 7 training, rotate it. The number of total speech sounds per speaker is 10.

  • Zahra is working on implementing CNN

    • 1-d CNN with sliding windows with small length (5-10 frames) and whole frequency component

    • Feature: MFCC, spectrogram, spectral flux, spectral tile

    • We found a python version of praat, which can provide various kinds of speech features

  • RNN implementation

    • Use masking to define loss function

    • RNN / LSTM / BLSTM

  • Method: Boosting

    • Find pairs of speakers who are hard to distinguish (determined by validation data)

    • Learn models and choose features to classify the hard cases

    • It can be cascaded in many levels, but try with narrows ones first