05/24/2018 Thr Lab Meeting

Post date: May 29, 2018 5:19:40 PM

Attendees: Zahra, 진석, 지호

Database
- TIMIT of 430 speakers
- Ignore Train/Test/DR slits
- Do cross-validation: 2 test / 1 val / 7 training, rotate it. The number of total speech sounds per speaker is 10.
Zahra is working on implementing CNN
- 1-d CNN with sliding windows with small length (5-10 frames) and whole frequency component
- Feature: MFCC, spectrogram, spectral flux, spectral tile
- We found a python version of praat, which can provide various kinds of speech features
RNN implementation
- Use masking to define loss function
- RNN / LSTM / BLSTM
Method: Boosting
- Find pairs of speakers who are hard to distinguish (determined by validation data)
- Learn models and choose features to classify the hard cases
- It can be cascaded in many levels, but try with narrows ones first