使用Google Speech Command Dataset資料集進行語音分析
Google Speech Command Dataset內含35個常用英文指令,下載此資料集,使用Att-RNN模型進行語音分析。
參考資料與程式碼:https://github.com/douglas125/SpeechCmdRecognition
本範例的Google Colab共用連結:https://drive.google.com/file/d/1D6iKCuuFFrQlxpX-H82vLgF-JCmm1L20/view?usp=sharing
Step1)連線Google雲端硬碟
Step2)在自己的雲端硬碟下建立與切換資料夾,下次執行時就不需要重新下載,但Google雲端硬碟空間要夠。
Step3)下載與安裝程式
Step4)檢查硬體裝置,是否啟用GPU
Step5)匯入函式庫
Step6)匯入Google Speech Command Dataset,並轉換成Python的字典資料結構
Step7)產生訓練、測試、驗證驗證資料集
Step8)顯示聲音的波形與播放聲音
Step9)建立模型
Step10)使用Drop-Based Learning Rate Schedule,經過15次訓練後,學習率為初始學習率(initial_lrate)的0.4倍,,經過30次訓練後,學習率為初始學習率(initial_lrate)的0.4*0.4倍
公式如下:LearningRate = InitialLearningRate * DropRate^floor(Epoch / EpochDrop)
Step11)訓練模型
模型參數儲存在檔案model-attRNN.h5,使用函式指標lrate控制學習率控制學習率
執行結果如下:
Changing learning rate to 0.001
Epoch 1/60
Epoch 00001: val_sparse_categorical_accuracy improved from -inf to 0.88424, saving model to model-attRNN.h5
2651/2651 - 398s - loss: 0.7463 - sparse_categorical_accuracy: 0.7886 - val_loss: 0.3968 - val_sparse_categorical_accuracy: 0.8842
Changing learning rate to 0.001
Epoch 2/60
Epoch 00002: val_sparse_categorical_accuracy improved from 0.88424 to 0.91891, saving model to model-attRNN.h5
2651/2651 - 408s - loss: 0.3368 - sparse_categorical_accuracy: 0.9047 - val_loss: 0.2898 - val_sparse_categorical_accuracy: 0.9189
Changing learning rate to 0.001
Epoch 3/60
Epoch 00003: val_sparse_categorical_accuracy improved from 0.91891 to 0.92846, saving model to model-attRNN.h5
2651/2651 - 405s - loss: 0.2605 - sparse_categorical_accuracy: 0.9261 - val_loss: 0.2586 - val_sparse_categorical_accuracy: 0.9285
Changing learning rate to 0.001
Epoch 4/60
Epoch 00004: val_sparse_categorical_accuracy did not improve from 0.92846
2651/2651 - 408s - loss: 0.2187 - sparse_categorical_accuracy: 0.9378 - val_loss: 0.2746 - val_sparse_categorical_accuracy: 0.9246
Changing learning rate to 0.001
Epoch 5/60
Epoch 00005: val_sparse_categorical_accuracy improved from 0.92846 to 0.92876, saving model to model-attRNN.h5
2651/2651 - 393s - loss: 0.1948 - sparse_categorical_accuracy: 0.9451 - val_loss: 0.2574 - val_sparse_categorical_accuracy: 0.9288
Changing learning rate to 0.001
Epoch 6/60
Epoch 00006: val_sparse_categorical_accuracy improved from 0.92876 to 0.93760, saving model to model-attRNN.h5
2651/2651 - 406s - loss: 0.1730 - sparse_categorical_accuracy: 0.9511 - val_loss: 0.2335 - val_sparse_categorical_accuracy: 0.9376
Changing learning rate to 0.001
Epoch 7/60
Epoch 00007: val_sparse_categorical_accuracy improved from 0.93760 to 0.94061, saving model to model-attRNN.h5
2651/2651 - 377s - loss: 0.1572 - sparse_categorical_accuracy: 0.9562 - val_loss: 0.2162 - val_sparse_categorical_accuracy: 0.9406
Changing learning rate to 0.001
Epoch 8/60
Epoch 00008: val_sparse_categorical_accuracy improved from 0.94061 to 0.94222, saving model to model-attRNN.h5
2651/2651 - 410s - loss: 0.1409 - sparse_categorical_accuracy: 0.9603 - val_loss: 0.2046 - val_sparse_categorical_accuracy: 0.9422
Changing learning rate to 0.001
Epoch 9/60
Epoch 00009: val_sparse_categorical_accuracy did not improve from 0.94222
2651/2651 - 387s - loss: 0.1311 - sparse_categorical_accuracy: 0.9634 - val_loss: 0.2223 - val_sparse_categorical_accuracy: 0.9412
Changing learning rate to 0.001
Epoch 10/60
Epoch 00010: val_sparse_categorical_accuracy did not improve from 0.94222
2651/2651 - 409s - loss: 0.1207 - sparse_categorical_accuracy: 0.9664 - val_loss: 0.2366 - val_sparse_categorical_accuracy: 0.9394
Changing learning rate to 0.001
Epoch 11/60
Epoch 00011: val_sparse_categorical_accuracy did not improve from 0.94222
2651/2651 - 377s - loss: 0.1132 - sparse_categorical_accuracy: 0.9685 - val_loss: 0.2424 - val_sparse_categorical_accuracy: 0.9401
Changing learning rate to 0.001
Epoch 12/60
Epoch 00012: val_sparse_categorical_accuracy did not improve from 0.94222
2651/2651 - 409s - loss: 0.1076 - sparse_categorical_accuracy: 0.9702 - val_loss: 0.2432 - val_sparse_categorical_accuracy: 0.9397
Changing learning rate to 0.001
Epoch 13/60
Epoch 00013: val_sparse_categorical_accuracy improved from 0.94222 to 0.94473, saving model to model-attRNN.h5
2651/2651 - 400s - loss: 0.0992 - sparse_categorical_accuracy: 0.9721 - val_loss: 0.2282 - val_sparse_categorical_accuracy: 0.9447
Changing learning rate to 0.001
Epoch 14/60
Epoch 00014: val_sparse_categorical_accuracy did not improve from 0.94473
2651/2651 - 400s - loss: 0.0965 - sparse_categorical_accuracy: 0.9738 - val_loss: 0.2508 - val_sparse_categorical_accuracy: 0.9395
Changing learning rate to 0.0004
Epoch 15/60
Epoch 00015: val_sparse_categorical_accuracy improved from 0.94473 to 0.94855, saving model to model-attRNN.h5
2651/2651 - 408s - loss: 0.0649 - sparse_categorical_accuracy: 0.9841 - val_loss: 0.2171 - val_sparse_categorical_accuracy: 0.9486
Changing learning rate to 0.0004
Epoch 16/60
Epoch 00016: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 407s - loss: 0.0582 - sparse_categorical_accuracy: 0.9862 - val_loss: 0.2362 - val_sparse_categorical_accuracy: 0.9451
Changing learning rate to 0.0004
Epoch 17/60
Epoch 00017: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 407s - loss: 0.0552 - sparse_categorical_accuracy: 0.9869 - val_loss: 0.2476 - val_sparse_categorical_accuracy: 0.9453
Changing learning rate to 0.0004
Epoch 18/60
Epoch 00018: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 399s - loss: 0.0514 - sparse_categorical_accuracy: 0.9879 - val_loss: 0.2578 - val_sparse_categorical_accuracy: 0.9448
Changing learning rate to 0.0004
Epoch 19/60
Epoch 00019: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 386s - loss: 0.0465 - sparse_categorical_accuracy: 0.9887 - val_loss: 0.2632 - val_sparse_categorical_accuracy: 0.9418
Changing learning rate to 0.0004
Epoch 20/60
Epoch 00020: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 406s - loss: 0.0506 - sparse_categorical_accuracy: 0.9886 - val_loss: 0.2557 - val_sparse_categorical_accuracy: 0.9452
Changing learning rate to 0.0004
Epoch 21/60
Epoch 00021: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 405s - loss: 0.0451 - sparse_categorical_accuracy: 0.9896 - val_loss: 0.2717 - val_sparse_categorical_accuracy: 0.9430
Changing learning rate to 0.0004
Epoch 22/60
Epoch 00022: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 409s - loss: 0.0447 - sparse_categorical_accuracy: 0.9897 - val_loss: 0.2719 - val_sparse_categorical_accuracy: 0.9444
Changing learning rate to 0.0004
Epoch 23/60
Epoch 00023: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 399s - loss: 0.0418 - sparse_categorical_accuracy: 0.9903 - val_loss: 0.2813 - val_sparse_categorical_accuracy: 0.9426
Changing learning rate to 0.0004
Epoch 24/60
Epoch 00024: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 409s - loss: 0.0389 - sparse_categorical_accuracy: 0.9911 - val_loss: 0.3032 - val_sparse_categorical_accuracy: 0.9412
Changing learning rate to 0.0004
Epoch 25/60
Restoring model weights from the end of the best epoch.
Epoch 00025: val_sparse_categorical_accuracy did not improve from 0.94855
2651/2651 - 402s - loss: 0.0394 - sparse_categorical_accuracy: 0.9913 - val_loss: 0.3143 - val_sparse_categorical_accuracy: 0.9391
Epoch 00025: early stopping
Step12)繪製正確率圖與損失圖
Step13)顯示訓練集、驗證集與測試集的損失與正確率
Step14)使用訓練好的模型,重新預估單筆訓練資料,並播放該聲音檔
Step15)將指定的聲音檔分析結果轉換成圖表顯示出來
Step16)產生混淆矩陣