05/24/2018 Thr Lab Meeting
Post date: May 29, 2018 5:19:40 PM
Attendees: Zahra, 진석, 지호
Database
TIMIT of 430 speakers
Ignore Train/Test/DR slits
Do cross-validation: 2 test / 1 val / 7 training, rotate it. The number of total speech sounds per speaker is 10.
Zahra is working on implementing CNN
1-d CNN with sliding windows with small length (5-10 frames) and whole frequency component
Feature: MFCC, spectrogram, spectral flux, spectral tile
We found a python version of praat, which can provide various kinds of speech features
RNN implementation
Use masking to define loss function
RNN / LSTM / BLSTM
Method: Boosting
Find pairs of speakers who are hard to distinguish (determined by validation data)
Learn models and choose features to classify the hard cases
It can be cascaded in many levels, but try with narrows ones first