使用Google Speech Command Dataset資料集進行語音分析

Google Speech Command Dataset內含35個常用英文指令,下載此資料集,使用Att-RNN模型進行語音分析。

參考資料與程式碼:https://github.com/douglas125/SpeechCmdRecognition

本範例的Google Colab共用連結:https://drive.google.com/file/d/1D6iKCuuFFrQlxpX-H82vLgF-JCmm1L20/view?usp=sharing

Step1)連線Google雲端硬碟

Step2)在自己的雲端硬碟下建立與切換資料夾,下次執行時就不需要重新下載,但Google雲端硬碟空間要夠。

Step3)下載與安裝程式

程式來源:https://github.com/douglas125/SpeechCmdRecognition

Step4)檢查硬體裝置,是否啟用GPU

Step5)匯入函式庫

Step6)匯入Google Speech Command Dataset,並轉換成Python的字典資料結構

Step7)產生訓練、測試、驗證驗證資料集

Step8)顯示聲音的波形與播放聲音

Step9)建立模型

Step10)使用Drop-Based Learning Rate Schedule,經過15次訓練後,學習率為初始學習率(initial_lrate)的0.4倍,,經過30次訓練後,學習率為初始學習率(initial_lrate)的0.4*0.4倍

公式如下:LearningRate = InitialLearningRate * DropRate^floor(Epoch / EpochDrop)

Step11)訓練模型

模型參數儲存在檔案model-attRNN.h5,使用函式指標lrate控制學習率控制學習率

執行結果如下:

Changing learning rate to 0.001

Epoch 1/60


Epoch 00001: val_sparse_categorical_accuracy improved from -inf to 0.88424, saving model to model-attRNN.h5

2651/2651 - 398s - loss: 0.7463 - sparse_categorical_accuracy: 0.7886 - val_loss: 0.3968 - val_sparse_categorical_accuracy: 0.8842

Changing learning rate to 0.001

Epoch 2/60


Epoch 00002: val_sparse_categorical_accuracy improved from 0.88424 to 0.91891, saving model to model-attRNN.h5

2651/2651 - 408s - loss: 0.3368 - sparse_categorical_accuracy: 0.9047 - val_loss: 0.2898 - val_sparse_categorical_accuracy: 0.9189

Changing learning rate to 0.001

Epoch 3/60


Epoch 00003: val_sparse_categorical_accuracy improved from 0.91891 to 0.92846, saving model to model-attRNN.h5

2651/2651 - 405s - loss: 0.2605 - sparse_categorical_accuracy: 0.9261 - val_loss: 0.2586 - val_sparse_categorical_accuracy: 0.9285

Changing learning rate to 0.001

Epoch 4/60


Epoch 00004: val_sparse_categorical_accuracy did not improve from 0.92846

2651/2651 - 408s - loss: 0.2187 - sparse_categorical_accuracy: 0.9378 - val_loss: 0.2746 - val_sparse_categorical_accuracy: 0.9246

Changing learning rate to 0.001

Epoch 5/60


Epoch 00005: val_sparse_categorical_accuracy improved from 0.92846 to 0.92876, saving model to model-attRNN.h5

2651/2651 - 393s - loss: 0.1948 - sparse_categorical_accuracy: 0.9451 - val_loss: 0.2574 - val_sparse_categorical_accuracy: 0.9288

Changing learning rate to 0.001

Epoch 6/60


Epoch 00006: val_sparse_categorical_accuracy improved from 0.92876 to 0.93760, saving model to model-attRNN.h5

2651/2651 - 406s - loss: 0.1730 - sparse_categorical_accuracy: 0.9511 - val_loss: 0.2335 - val_sparse_categorical_accuracy: 0.9376

Changing learning rate to 0.001

Epoch 7/60


Epoch 00007: val_sparse_categorical_accuracy improved from 0.93760 to 0.94061, saving model to model-attRNN.h5

2651/2651 - 377s - loss: 0.1572 - sparse_categorical_accuracy: 0.9562 - val_loss: 0.2162 - val_sparse_categorical_accuracy: 0.9406

Changing learning rate to 0.001

Epoch 8/60


Epoch 00008: val_sparse_categorical_accuracy improved from 0.94061 to 0.94222, saving model to model-attRNN.h5

2651/2651 - 410s - loss: 0.1409 - sparse_categorical_accuracy: 0.9603 - val_loss: 0.2046 - val_sparse_categorical_accuracy: 0.9422

Changing learning rate to 0.001

Epoch 9/60


Epoch 00009: val_sparse_categorical_accuracy did not improve from 0.94222

2651/2651 - 387s - loss: 0.1311 - sparse_categorical_accuracy: 0.9634 - val_loss: 0.2223 - val_sparse_categorical_accuracy: 0.9412

Changing learning rate to 0.001

Epoch 10/60


Epoch 00010: val_sparse_categorical_accuracy did not improve from 0.94222

2651/2651 - 409s - loss: 0.1207 - sparse_categorical_accuracy: 0.9664 - val_loss: 0.2366 - val_sparse_categorical_accuracy: 0.9394

Changing learning rate to 0.001

Epoch 11/60


Epoch 00011: val_sparse_categorical_accuracy did not improve from 0.94222

2651/2651 - 377s - loss: 0.1132 - sparse_categorical_accuracy: 0.9685 - val_loss: 0.2424 - val_sparse_categorical_accuracy: 0.9401

Changing learning rate to 0.001

Epoch 12/60


Epoch 00012: val_sparse_categorical_accuracy did not improve from 0.94222

2651/2651 - 409s - loss: 0.1076 - sparse_categorical_accuracy: 0.9702 - val_loss: 0.2432 - val_sparse_categorical_accuracy: 0.9397

Changing learning rate to 0.001

Epoch 13/60


Epoch 00013: val_sparse_categorical_accuracy improved from 0.94222 to 0.94473, saving model to model-attRNN.h5

2651/2651 - 400s - loss: 0.0992 - sparse_categorical_accuracy: 0.9721 - val_loss: 0.2282 - val_sparse_categorical_accuracy: 0.9447

Changing learning rate to 0.001

Epoch 14/60


Epoch 00014: val_sparse_categorical_accuracy did not improve from 0.94473

2651/2651 - 400s - loss: 0.0965 - sparse_categorical_accuracy: 0.9738 - val_loss: 0.2508 - val_sparse_categorical_accuracy: 0.9395

Changing learning rate to 0.0004

Epoch 15/60


Epoch 00015: val_sparse_categorical_accuracy improved from 0.94473 to 0.94855, saving model to model-attRNN.h5

2651/2651 - 408s - loss: 0.0649 - sparse_categorical_accuracy: 0.9841 - val_loss: 0.2171 - val_sparse_categorical_accuracy: 0.9486

Changing learning rate to 0.0004

Epoch 16/60


Epoch 00016: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 407s - loss: 0.0582 - sparse_categorical_accuracy: 0.9862 - val_loss: 0.2362 - val_sparse_categorical_accuracy: 0.9451

Changing learning rate to 0.0004

Epoch 17/60


Epoch 00017: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 407s - loss: 0.0552 - sparse_categorical_accuracy: 0.9869 - val_loss: 0.2476 - val_sparse_categorical_accuracy: 0.9453

Changing learning rate to 0.0004

Epoch 18/60


Epoch 00018: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 399s - loss: 0.0514 - sparse_categorical_accuracy: 0.9879 - val_loss: 0.2578 - val_sparse_categorical_accuracy: 0.9448

Changing learning rate to 0.0004

Epoch 19/60


Epoch 00019: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 386s - loss: 0.0465 - sparse_categorical_accuracy: 0.9887 - val_loss: 0.2632 - val_sparse_categorical_accuracy: 0.9418

Changing learning rate to 0.0004

Epoch 20/60


Epoch 00020: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 406s - loss: 0.0506 - sparse_categorical_accuracy: 0.9886 - val_loss: 0.2557 - val_sparse_categorical_accuracy: 0.9452

Changing learning rate to 0.0004

Epoch 21/60


Epoch 00021: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 405s - loss: 0.0451 - sparse_categorical_accuracy: 0.9896 - val_loss: 0.2717 - val_sparse_categorical_accuracy: 0.9430

Changing learning rate to 0.0004

Epoch 22/60


Epoch 00022: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 409s - loss: 0.0447 - sparse_categorical_accuracy: 0.9897 - val_loss: 0.2719 - val_sparse_categorical_accuracy: 0.9444

Changing learning rate to 0.0004

Epoch 23/60


Epoch 00023: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 399s - loss: 0.0418 - sparse_categorical_accuracy: 0.9903 - val_loss: 0.2813 - val_sparse_categorical_accuracy: 0.9426

Changing learning rate to 0.0004

Epoch 24/60


Epoch 00024: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 409s - loss: 0.0389 - sparse_categorical_accuracy: 0.9911 - val_loss: 0.3032 - val_sparse_categorical_accuracy: 0.9412

Changing learning rate to 0.0004

Epoch 25/60

Restoring model weights from the end of the best epoch.


Epoch 00025: val_sparse_categorical_accuracy did not improve from 0.94855

2651/2651 - 402s - loss: 0.0394 - sparse_categorical_accuracy: 0.9913 - val_loss: 0.3143 - val_sparse_categorical_accuracy: 0.9391

Epoch 00025: early stopping

Step12)繪製正確率圖與損失圖

Step13)顯示訓練集、驗證集與測試集的損失與正確率

Step14)使用訓練好的模型,重新預估單筆訓練資料,並播放該聲音檔

Step15)將指定的聲音檔分析結果轉換成圖表顯示出來

Step16)產生混淆矩陣