For front-end, we use HTML, CSS, and JavaScript for setting up the webpage, and we use Bootstrap to create a uniform customization.
For back-end, we use Python and its web framework, Flask, to develop the web application, and use PyAudio as our voice recognition library.
For database, we use SQLAlchemy as our database toolkit.
For machine learning and NLP, we use Python's libraries such as Scikit-learn, NLTK, and Spacy.
Our chatbot system is inspired by Eliza, which is a program that was created by the MIT AI Lab to mimic a Rogerian therapy. It uses pattern matching to generate responses. We made an agent system that does similar thing, and here is how it works:
It first reads the user's input from left to right, scanning for all the keywords it finds.
It then sorts the keywords in descending weights; for each keyword, there's a set of decomposition rules, and the first one that matches is selected.
Finally, it selects a reassembly pattern as the response.
We implemented a function nlp_analyze( ) that takes in the user's input and output a response that sounds more natural using the NLTK and Spacy libraries. In the function, we set a threshold value of 0.75, which distinguishes the positive sentiment and negative sentiment. We also use the NLTK FreqDict to store the frequency distribution of the user's words, and from there we are able to make the agent give a response that centers around the topic that the user emphasizes on.
At the end of each session, the results of the therapy are displayed. Below will show you what features our web application has.
On the top of the result page is a pre-defined suggestion or a comment that is given by the real therapist. Right now we randomly select an advice out of 20 that we found online. In the future, we plan to add an interface that a real therapist can login and give feedback to the patient here.
The word cloud displays the most frequent words that the user say during the session from largest to smallest. This is useful when the feedback page is shown to a real therapist, the therapist can quickly identify the topic of the conversation.
This donut chart represents the overall mood of the user for the session. We first find the polarity score of the user's input to categorize the input as either positive, negative, or neutral. We then use CanvasJS API to create animated chart.
This line graph is the emotional state of the user during the session and is calculated by the intensity of the words that the user uses. We use Python library vaderSentiment to do that.
This part is the topic of the therapy session that is identified, and it is predicted by a Naive Bayes classifier. We found a public therapy dataset from the internet that has a total of 31 different topics, so we decided to use a multi-class classification model. We created the model using Scikit-learn and load the model to our web application using Pickle.
We store the result of each session to our database, and as a result the user is able to browse through previous sessions.
On the top left is an overall word cloud, and this is generated by adding all the chat logs and feed into the Wordcloud library. This allows the user to quickly identify what topic the user talks the most.
On the top right is the overall percentage of the topics of all the sessions. Right now the image is just a placeholder because there's only a small amount of previous sessions that are conducted.
Below that, there's a list of clickable buttons that the user can click on, and they can direct the user to the result page of that particular session.
And of course, the user is able to go back to the home page via the "Back to Home" button.
The image on the right is our database model. Each session has a unique id (generated by uuid), the timestamp (in UTC), topic identified, a string of floating points as the score, the chat log, the generated images, and the comment.