ARIA

Artificially Responsive Intelligent Assistant

2023-2024

Abstract

In today’s era, out of the many challenges that the world is facing, mental health is a major concern among individuals. The need of the hour is to use advanced technologies that can overcome mental health disorders by providing therapy and counselling to facilitate them. There are many systems that often rely on text-based communication but few systems offer voice-based communication for counseling. The research related to this field has shown that the voice-based assistants can be a great help for a person who is dealing with mild to moderate mental health issues. The goal is to develop an embodied conversational agent that will overcome mental health disorders among individuals. It will reduce feelings of loneliness, social disconnection, stress, depression, and procrastination by providing engaging and emotionally responsive communication to the people. It will provide them with a path to better well-being.

Introduction

Mental health is undeniably the biggest challenge that the world is facing these days. Almost every person is suffering from this challenge. Mental well-being comprises of emotional, psychological and social dimensions of life influencing our thoughts, emotions and behaviors. The organizations like World Health Organization (WHO) have taken significant measures to raise awareness about mental health disorders. According to WHO’s 2019 data [1], approximately 970 million individuals all around the world were experiencing a mental health disorder. The year 2020 witnessed a wide increase in these disorders because of the COVID-19 pandemic. This global outbreak had a severe impact on the mental health and wellbeing of people around the world.

Individuals often hesitate to seek assistance from a psychiatrist due to concerns about being judged. This fear hindered them from receiving the appropriate therapy required for their mental wellbeing. As a result, mental health problems began to increase in number because individuals didn't receive proper counseling. The research [2] has shown that people can receive proper counselling through artificially crafted chatbots because of their potential to alleviate people’s mental well-being by offering effective psychotherapeutic interventions. Many text-based chatbots like Woebot [3] and Youper [4] were built using the latest technology. These chatbots were helpful for mild to moderate mental health problems. While this strategy has worked so far, recent research indicates a shift in preference towards voice-based chatbots. People now find voice-based chatbots more effective compared to simpler text-based ones.

Many text-based chatbots in the past were devoid of avatars and lacked voice capabilities. The current imperative is to develop chatbots that are not only user-friendly and practical but also visually engaging. The motivation behind integrating avatars with speech was to provide users with a more authentic experience. This will not replace a psychiatrist, but will provide a simulation of it.

Our desktop application incorporates an embodied conversational agent designed to assist individuals in overcoming their mental health issues. The avatar sourced from Ready Player Me characters will serve as a virtual assistant, engaging users in voice conversations. To understand and measure the personality traits of users, we employ the Big Five Factor Model [5], commonly known as the OCEAN traits. These traits: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism are predicted through a weekly questionnaire filled out by the users. The user's personality will be assessed weekly, comparing it with previously recorded personalities. The system will check for any progress made since the last assessment.

Our application will efficiently determine the emotions of the user through speech, text, and facial features. Speech emotions are analyzed using a convolutional neural network. Testing of our system is conducted using two datasets: the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [7] and the Toronto Emotional Speech Set (TESS) [8]. Facial emotions are determined through the Facial Action Coding System [9], which involves detecting micro-expressions using action units. For this purpose, pre-trained libraries such as OpenFace [10] or PyFeat [11] will be employed. Text emotions will be extracted using datasets like 'Emotions Data for NLP' [12], allowing us to identify seven basic emotions: Happy, Sad, Disgust, Neutral, Anger, Surprise, and Fear. 

Furthermore, our application employs a pre-trained Llama2 model which will be fine-tuned specifically to facilitate Catharsis [13]. This model generates responses based on cathartic principles. The training data for this model is sourced from two datasets: 'Mental Health Conversational Data' [14] and 'Counsel Chat [15]. 

Lastly, our embodied agent will feature realistic facial expressions to enhance its lifelike appearance. Additionally, thorough voice tuning will be implemented with accurate lip-syncing using Oculus [16] to simulate the presence of a psychiatrist, providing users with a more authentic and engaging experience.

Datasets 

The datasets used in our project are:

Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

Link: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio 

Toronto emotional speech set (TESS)

Link: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess 

Emotions Dataset for NLP

Link: https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp

Mental Health Conversational Data

Link: www.kaggle.com/datasets/elvis23/mental-health-conversational-data 

Counsel-chat

Link: huggingface.co/datasets/nbertagnolli/counsel-chat 

OCEAN Model

We will use OCEAN model ( also known as Five Factor model) to predict the personalities of the user.

O (Openness) is characterized by a broad range of interests, curiosity, and a willingness to try new things. People who are high in openness are often creative, imaginative, and insightful. They may also be more adventurous and open to change than those who are low in openness. 

C (Conscientiousness) is characterized by thoughtfulness, impulse control, and goal-directedness. People who are high in conscientiousness tend to be organized, efficient, and reliable. They are also good at planning ahead and meeting deadlines. 

E (Extraversion) is characterized by sociability, talkativeness, assertiveness, and high emotional expressiveness. People who are high in extraversion are outgoing and enjoy being around people. They tend to be energized by social interaction and feel drained when they are alone.

A (Agreeableness) is characterized by trust, altruism, kindness, affection, and other prosocial behaviors. People who are high in agreeableness are more likely to be cooperative, helpful, and forgiving. They are also less likely to be competitive or manipulative.

N (Neuroticism) is characterized by a tendency to experience negative emotions, such as anxiety, sadness, irritability, and self-doubt. People who are high in neuroticism are more likely to be emotionally reactive and have difficulty coping with stress.

Proposed Methodology and Architecture

Our desktop application will require users to complete a questionnaire that predicts the user's personality whenever they create an account. This personality assessment will be done on the weekly basis. This questionnaire will help predict and understand their evolving personalities over time, fostering self-awareness and personal growth. Through their camera and voice/text input, their emotions will be predicted and then in case of speech, it will be further converted to text. This text will be fed into a fine-tuned model based on Catharsis [13]. For this purpose, we will be doing the fine tuning of Meta Llama2 model. The results from our fine-tuned model will be presented to users in a conversational manner. To achieve this, the model's responses will be converted into lifelike voices using text-to-speech services. The avatar will additionally have facial expressions and voice modulations to simulate the presence of a real psychiatrist. Figure 1 shows the system architecture of our project. 

Goals and Objectives

The major goals and objectives of the project are:

Scope 

Advancements in Artificial Intelligence and Machine Learning have led to the rise in popularity of chatbots, especially in the field of mental health. However, there are limited voice-based chatbots that offer interactive environments and are often expensive. Therefore, our project aims to create an engaging and responsive environment for mental health patients. The system may not replace a psychiatrist, but it can assist them by doing basic analysis and providing an initial assessment. The proposed application includes: 

Dataset Collection

We are currently in the process of collecting our own speech dataset to evaluate the performance of our speech emotion detection model. Our primary focus is on assessing local accents, with the aim of enhancing the model's accuracy by understanding different speech patterns. This effort involves capturing a variety of language styles observed in human communication. If you are interested in participating in our data collection process, kindly visit Here.

Assumptions and Constraints

Assumptions

Constraints

Challenges

Currently, we are facing the following challenges:

Tools and Technologies

Tools and technologies that will be used for this project are:

For desktop application development

Unity

Python

Ready Player Me

Google Collab

VSCode

MongoDB

Meta

AWS

Future Work

Below are the ideas which we plan for the expansion of our project in future:

Information Technology Symposium (iTS'24)

🏆We're thrilled to announce that our project, ARIA (Artificially Responsive Intelligent Assistant), has clinched the top prize of PKR 250,000/- as the winner of the Information Technology Symposium (iTS'24), hosted by Cogent Labs! 🥇 

It's been an incredible journey competing against over 150 projects from various universities. The competition was fierce, with three challenging phases - initial submission, mid-evaluation, and the final round. Out of the pool of talent, only 7 projects, made it to the final stage and our project ARIA emerges as the Winner of the competition.

We had the honor of being judged by four distinguished individuals: CEO of Cogent Labs ; co-founder of Arbisoft ; lead data scientist of xiQ, Inc.; and COO of Codexia Technologies . They engaged us with thought-provoking questions and shared invaluable feedback. We're proud to have addressed each query with precision, reflecting our deep understanding and commitment to our project. 

We are truly grateful to Allah Almighty for this achievement. We want to extend gratitude to our supervisor Dr. Usama Ijaz Bajwa . None of this would've been possible without his support and guidance. His mentorship was truly invaluable. 

We're deeply grateful to our parents, friends, and everyone who supported us along the way. And a big shoutout to Cogent Labs for organizing such a fantastic event, providing a platform for students to showcase their creativity and innovation. 

Below are some pictures from the prize distribution ceremony at Cogent Labs:

The Team

Project Supervisor

Dr. Usama Ijaz Bajwa

Co-PI, Video Analytics lab, National Centre in Big Data and Cloud Computing,

HEC Approved PhD Supervisor,

Tenured Associate Professor 

Department of Computer Science,

COMSATS University Islamabad, Lahore Campus, Pakistan

www.usamaijaz.com 

www.fit.edu.pk 

Job Profile 

Google Scholar Profile 

Linkedin Profile 

Tania Zaheer

    Email: taniazaheer31@gmail.com 

BS Student

(Computer Science, COMSATS Lahore)

Linkedin Profile

Github Profile

Hamna Khawar

  Email: hamnakhawar45@gmail.com 

BS Student

(Computer Science, COMSATS Lahore)

Linkedin Profile

Github Profile

Sania Sadaqat

Email: saniasadaqat3526@gmail.com 

BS Student 

(Computer Science, COMSATS Lahore)

Linkedin Profile

Github Profile

References

[1] “World Health Organization,” 8 December 2023. [Online]. Available: https://www.who.int/.

[2] J. Striegl, M. Gotthardt, C. Loitsch and G. Weber, “Investigating the usability of voice-assistant based CBT for age related depression,” Lecture notes in computer science, 2022. [Online]. [Accessed 8 December 2023].

[3] “Woebot Health,” [Online]. Available: https://woebothealth.com/. [Accessed 8 December 2023].

[4] “Artificial Intelligence for Mental Health Care,” Youper, [Online]. Available: https://www.youper.ai/. [Accessed 8 December 2023].

[5] C. Soto and J. Jackson, “Psychology,” Five Factor Model of Personality, 2013.

[6] T. B., “Big Five Personality Test,” Kaggle, [Online]. Available: https://www.kaggle.com/datasets/tunguz/big-five-personality-test. [Accessed 8 December 2023].

[7] “Kaggle,” [Online]. Available: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio. [Accessed 8 December 2023].

[8] “Kaggle,” [Online]. Available: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess. [Accessed 8 December 2023].

[9] “Facial Action Coding System,” Paul Ekman Group, 2020. [Online]. Available: https://www.paulekman.com/facial-action-coding-system/.

[10] “TadasBaltrusaitis/OpenFace: OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.,” Github, [Online]. Available: https://github.com/TadasBaltrusaitis/OpenFace.

[11] “Feat: Python Facial Expression Analysis Toolbox,” Py, [Online]. Available: https://py-feat.org/pages/intro.html. [Accessed 8 December 2023].

[12] “Emotions dataset for NLP,” Kaggle, 2020. [Online]. Available: https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp?resource=download. [Accessed 8 December 2023].

[13] A. K, “Catharsis: A literature review,” Journal of psychiatric and mental health nursing., [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/7655913/. [Accessed 8 December 2023].

[14] “Mental Health Conversational Data,” Kaggle, [Online]. Available: https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data. [Accessed 8 December 2023].

[15] nbertagnolli/counsel-chat · Datasets at Hugging Face, [Online]. Available: https://huggingface.co/datasets/nbertagnolli/counsel-chat. [Accessed 8 December 2023].

[16] “Oculus Lipsync for Unity Development: Unity | Oculus Developers,” [Online]. Available: https://developer.oculus.com/documentation/unity/audio-ovrlipsync-unity/. [Accessed 8 December 2023].