Collected data will be pre-processed for labeling, diversity , cleaning (noise removal, mute sounds) , ... and then characterized by MFCCs for training. n model two models.
The dataset consists of 8 keywords, each keyword that includes more than 130 speakers are collected
Sound data will be continuously collected by the sensor in real time.
Segment each audio segment in fixed time, perform feature extraction directly on MCU and put into the model to make predictions. If the wake-up keyword pattern is detected at first, then it will move on to active keyword discovery with model number two.
If action keyword is detected then start taking the specified action.