The main goal of this project is to develop a virtual assistant that can be used for variety of purposes such helping from a child with speech delay to a senior citizen for his daily tasks.
Model2: Task Classification & Respond
In this part, I moved model1 to the front end by extracting the learned model1 parameters to a file and wrote a service in the front application to make the decision to whether to make a call to the backend or not. Essentially, I do not want my room talk to be sent to the server. It should be sent only if I'm requesting something from the bot.
I trained a new model to make topic classification I trained it for shopping list, time, timer and story related queries to the bot. The server gets the sentence and makes a topic prediction. The server then generates response to be sent to the front end.
I used different models to generate response:
NLTK to parse out, shopping list item
RNN to generate a story text.
Data Collection & Model1: Predict if the assistant should respond
For data collection, I have written a custom code to listen to the environment and transcribe to text. I have it running for about an hour while I was talking to my son. I moved the generated text to a file called negative.txt since the entire talk was between my son and myself. I had another session and give directions where I was this bot to respond. I placed the generated results to positive.txt file.
This is a supervised machine learning problem. I have created a function to read both files (negative/positive) and load them to a pandas data frame. Initially, I was doing training a train/test split but later I changed my strategy because of not having enough data. Instead of train/test splits, I have implemented KFold [1] so that the model can see/validate using the entire dataset by taking a portion out each iteration.
In addition, because of not having enough data and being the dataset is imbalanced, I replicated the positive cases until the target column is balanced. This way, I aimed to eliminate the problem of model choosing the popular class.
I have decided to use only CountVectorizer as opposed to TdIdfVectorizer. TdIdfVectorizer may not help increasing the accuracy because the goal is to do a binary classification rather than multiclass classification.
I have tried multiple models with the help of Choosing the right estimator page of Scikit learn.
To retain locality, I have used bigram during the vectorization however they perform very low.
LogisticRegression was the winner in all areas.
I saved the model and deployed to a local server as an end point.
The modals overall performed really well. This raised a suspicion. Reasons may be:
The results may be different, if text gathered was from adults talk where model may not predict so easily.
While recording we were facing each other (not facing the computer where the mic is) so the transcription was not very accurate where I was recording the positive cases, I was facing to the computer which resulted in better transcription.
In this project I will be creating two datasets.
This dataset will be including directives/questions and general conversation text. The target is whether it is for the assistant. (0,1).
This dataset will be having various texts and corresponding tasks.
References:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Cute Carton Funny Character by bcogwene https://pixabay.com/illustrations/cute-cartoon-robot-funny-character-807306/
https://machinetalk.org/2019/02/08/text-generation-with-pytorch/