ARIA
Artificially Responsive Intelligent Assistant
2023-2024
Abstract
In today’s era, out of the many challenges that the world is facing, mental health is a major concern among individuals. The need of the hour is to use advanced technologies that can overcome mental health disorders by providing therapy and counselling to facilitate them. There are many systems that often rely on text-based communication but few systems offer voice-based communication for counseling. The research related to this field has shown that the voice-based assistants can be a great help for a person who is dealing with mild to moderate mental health issues. The goal is to develop an embodied conversational agent that will overcome mental health disorders among individuals. It will reduce feelings of loneliness, social disconnection, stress, depression, and procrastination by providing engaging and emotionally responsive communication to the people. It will provide them with a path to better well-being.
Introduction
Mental health is undeniably the biggest challenge that the world is facing these days. Almost every person is suffering from this challenge. Mental well-being comprises of emotional, psychological and social dimensions of life influencing our thoughts, emotions and behaviors. The organizations like World Health Organization (WHO) have taken significant measures to raise awareness about mental health disorders. According to WHO’s 2019 data [1], approximately 970 million individuals all around the world were experiencing a mental health disorder. The year 2020 witnessed a wide increase in these disorders because of the COVID-19 pandemic. This global outbreak had a severe impact on the mental health and wellbeing of people around the world.
Individuals often hesitate to seek assistance from a psychiatrist due to concerns about being judged. This fear hindered them from receiving the appropriate therapy required for their mental wellbeing. As a result, mental health problems began to increase in number because individuals didn't receive proper counseling. The research [2] has shown that people can receive proper counselling through artificially crafted chatbots because of their potential to alleviate people’s mental well-being by offering effective psychotherapeutic interventions. Many text-based chatbots like Woebot [3] and Youper [4] were built using the latest technology. These chatbots were helpful for mild to moderate mental health problems. While this strategy has worked so far, recent research indicates a shift in preference towards voice-based chatbots. People now find voice-based chatbots more effective compared to simpler text-based ones.
Many text-based chatbots in the past were devoid of avatars and lacked voice capabilities. The current imperative is to develop chatbots that are not only user-friendly and practical but also visually engaging. The motivation behind integrating avatars with speech was to provide users with a more authentic experience. This will not replace a psychiatrist, but will provide a simulation of it.
Our desktop application incorporates an embodied conversational agent designed to assist individuals in overcoming their mental health issues. The avatar sourced from Ready Player Me characters will serve as a virtual assistant, engaging users in voice conversations. To understand and measure the personality traits of users, we employ the Big Five Factor Model [5], commonly known as the OCEAN traits. These traits: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism are predicted through a weekly questionnaire filled out by the users. The user's personality will be assessed weekly, comparing it with previously recorded personalities. The system will check for any progress made since the last assessment.
Our application will efficiently determine the emotions of the user through speech, text, and facial features. Speech emotions are analyzed using a convolutional neural network. Testing of our system is conducted using two datasets: the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [7] and the Toronto Emotional Speech Set (TESS) [8]. Facial emotions are determined through the Facial Action Coding System [9], which involves detecting micro-expressions using action units. For this purpose, pre-trained libraries such as OpenFace [10] or PyFeat [11] will be employed. Text emotions will be extracted using datasets like 'Emotions Data for NLP' [12], allowing us to identify seven basic emotions: Happy, Sad, Disgust, Neutral, Anger, Surprise, and Fear.
Furthermore, our application employs a pre-trained Llama2 model which will be fine-tuned specifically to facilitate Catharsis [13]. This model generates responses based on cathartic principles. The training data for this model is sourced from two datasets: 'Mental Health Conversational Data' [14] and 'Counsel Chat [15].
Lastly, our embodied agent will feature realistic facial expressions to enhance its lifelike appearance. Additionally, thorough voice tuning will be implemented with accurate lip-syncing using Oculus [16] to simulate the presence of a psychiatrist, providing users with a more authentic and engaging experience.
Datasets
The datasets used in our project are:
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)
Link: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio
Toronto emotional speech set (TESS)
Link: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess
Emotions Dataset for NLP
Link: https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp
Mental Health Conversational Data
Link: www.kaggle.com/datasets/elvis23/mental-health-conversational-data
Counsel-chat
OCEAN Model
We will use OCEAN model ( also known as Five Factor model) to predict the personalities of the user.
O (Openness) is characterized by a broad range of interests, curiosity, and a willingness to try new things. People who are high in openness are often creative, imaginative, and insightful. They may also be more adventurous and open to change than those who are low in openness.
C (Conscientiousness) is characterized by thoughtfulness, impulse control, and goal-directedness. People who are high in conscientiousness tend to be organized, efficient, and reliable. They are also good at planning ahead and meeting deadlines.
E (Extraversion) is characterized by sociability, talkativeness, assertiveness, and high emotional expressiveness. People who are high in extraversion are outgoing and enjoy being around people. They tend to be energized by social interaction and feel drained when they are alone.
A (Agreeableness) is characterized by trust, altruism, kindness, affection, and other prosocial behaviors. People who are high in agreeableness are more likely to be cooperative, helpful, and forgiving. They are also less likely to be competitive or manipulative.
N (Neuroticism) is characterized by a tendency to experience negative emotions, such as anxiety, sadness, irritability, and self-doubt. People who are high in neuroticism are more likely to be emotionally reactive and have difficulty coping with stress.
Proposed Methodology and Architecture
Our desktop application will require users to complete a questionnaire that predicts the user's personality whenever they create an account. This personality assessment will be done on the weekly basis. This questionnaire will help predict and understand their evolving personalities over time, fostering self-awareness and personal growth. Through their camera and voice/text input, their emotions will be predicted and then in case of speech, it will be further converted to text. This text will be fed into a fine-tuned model based on Catharsis [13]. For this purpose, we will be doing the fine tuning of Meta Llama2 model. The results from our fine-tuned model will be presented to users in a conversational manner. To achieve this, the model's responses will be converted into lifelike voices using text-to-speech services. The avatar will additionally have facial expressions and voice modulations to simulate the presence of a real psychiatrist. Figure 1 shows the system architecture of our project.
Goals and Objectives
The major goals and objectives of the project are:
To develop a desktop application that provides therapy and counselling to the people.
To predict the personality of users.
To predict the current emotional status of the user.
To create an environment that is less stressful and more engaging for the users.
To make sure that the system provides only ethical answers.
To provide real time both voice and text responses.
To track the progress of the users.
Scope
Advancements in Artificial Intelligence and Machine Learning have led to the rise in popularity of chatbots, especially in the field of mental health. However, there are limited voice-based chatbots that offer interactive environments and are often expensive. Therefore, our project aims to create an engaging and responsive environment for mental health patients. The system may not replace a psychiatrist, but it can assist them by doing basic analysis and providing an initial assessment. The proposed application includes:
A user-friendly interface that allows users to interact with the avatar easily.
Real-time conversations that are completely aligned with the user's personality.
Detection of the signs of mental health disorders through a questionnaire.
To generate facial expressions based on the response generated by the fine-tuned Llama model.
A progress tracker that will monitor the improvement in the mental health of the user.
Dataset Collection
We are currently in the process of collecting our own speech dataset to evaluate the performance of our speech emotion detection model. Our primary focus is on assessing local accents, with the aim of enhancing the model's accuracy by understanding different speech patterns. This effort involves capturing a variety of language styles observed in human communication. If you are interested in participating in our data collection process, kindly visit Here.
Assumptions and Constraints
Assumptions
The user must have a laptop or a computer.
The user’s laptop or computer must have a functioning camera and microphone.
The user must have an internet connection to use our system.
The user must know basic English.
Constraints
Our desktop application cannot function properly without an active internet connection.
Our application only supports English language; it won't work with other languages.
In certain situations, the response time may extend to approximately 6-7 seconds.
Challenges
Currently, we are facing the following challenges:
We're currently grappling with the challenge of enhancing the avatar's realism, particularly in terms of its expressions, which currently lack a convincing lifelike quality
Our project is facing challenges in gathering a dataset for speech emotions analysis. Many individuals are unwilling to provide recordings expressing the targeted emotion required for the accurate analysis.
Integrating our speech emotions model, face emotions model, and fine-tuned model into Unity posed several challenges. Initially, we had to convert these files to ONNX format to ensure compatibility within the Unity environment. We also learned that older versions are more stable, and more documentation is available for them.
Tools and Technologies
Tools and technologies that will be used for this project are:
For desktop application development
Unity
Python
Ready Player Me
Google Collab
VSCode
MongoDB
Meta
AWS
Future Work
Below are the ideas which we plan for the expansion of our project in future:
Consider ARIA as a Mobile Application.
Provide Cognitive Behavioral Therapy (CBT) for more enhanced and therapeutic answers.
Provide Multilingual support to cater a wide range of users.
Project Resources


Project Report
Project Presentation (PPT)

Project Poster
Information Technology Symposium (iTS'24)
🏆We're thrilled to announce that our project, ARIA (Artificially Responsive Intelligent Assistant), has clinched the top prize of PKR 250,000/- as the winner of the Information Technology Symposium (iTS'24), hosted by Cogent Labs! 🥇
It's been an incredible journey competing against over 150 projects from various universities. The competition was fierce, with three challenging phases - initial submission, mid-evaluation, and the final round. Out of the pool of talent, only 7 projects, made it to the final stage and our project ARIA emerges as the Winner of the competition.
We had the honor of being judged by four distinguished individuals: CEO of Cogent Labs ; co-founder of Arbisoft ; lead data scientist of xiQ, Inc.; and COO of Codexia Technologies . They engaged us with thought-provoking questions and shared invaluable feedback. We're proud to have addressed each query with precision, reflecting our deep understanding and commitment to our project.
We are truly grateful to Allah Almighty for this achievement. We want to extend gratitude to our supervisor Dr. Usama Ijaz Bajwa . None of this would've been possible without his support and guidance. His mentorship was truly invaluable.
We're deeply grateful to our parents, friends, and everyone who supported us along the way. And a big shoutout to Cogent Labs for organizing such a fantastic event, providing a platform for students to showcase their creativity and innovation.
Below are some pictures from the prize distribution ceremony at Cogent Labs:
FYP Competition
We are thrilled that our Final Year Project: ARIA (Artificially Responsive Intelligent Assistant), has secured the first prize in the Computer Science department at COMSATS University Islamabad, Lahore Campus. This victory is a testament to the hard work, dedication, and innovative spirit of our team.
The event was a grand success, with our presentation and live demonstration attracting a full room of attendees. Our project showcased impressive responsiveness and versatility, demonstrating its potential to revolutionize various aspects of digital interaction. The judges were particularly impressed with the innovation and functionality embedded in ARIA. Their feedback highlighted the uniqueness of our project and they commended our comprehensive presentation, the seamless live demonstration, and the robust technical foundations of ARIA.
We owe this success to our remarkable supervisor, Dr. Usama Ijaz Bajwa. His unwavering support kept us motivated and focused. This achievement wouldn't have been possible without his encouragement and invaluable guidance throughout the project.
The journey to this achievement was far from easy. We encountered numerous challenges throughout the project. Alhamdulillah, we managed to overcome all these hurdles and deliver a project that not only met but exceeded expectations. This victory is not just a win for our team but also a celebration of innovation and hard work.
Below are some pictures from the prize distribution ceremony:
AIEF International AI Championship
We are beyond grateful to share that our project ARIA secured 1st Position with a prize of PKR 300,000/- at the 2nd International AI Championship, organized by the Artificial Intelligence Education Foundation at FAST University and powered by Soliton Technologies.
Competing against over 200 projects from various universities, our project stood out for its innovation and impact. The experience of presenting live demos, engaging with esteemed evaluators, and receiving valuable feedback was incredibly rewarding.
This marks our third consecutive win! Our journey started with winning the first prize of PKR 250,000 at Cogent Labs, followed by securing first position at the COMSATS University FYP Exhibition. Now, the first position at AIEF with a prize of PKR 300,000 feels like a dream come true.
This achievement wouldn’t have been possible without the exceptional guidance of our supervisor, Dr Usama Ijaz Bajwa. His support, insights, and belief in us made all the difference, and we are forever grateful for his mentorship. This journey has been overwhelming, challenging, and inspiring all at once, and it’s a memory that will stay with us forever.
Below are some pictures from the prize distribution ceremony:
The Team
Project Supervisor
Dr. Usama Ijaz Bajwa
Co-PI, Video Analytics lab, National Centre in Big Data and Cloud Computing,
HEC Approved PhD Supervisor,
Tenured Associate Professor
Department of Computer Science,
COMSATS University Islamabad, Lahore Campus, Pakistan
References
[1] “World Health Organization,” 8 December 2023. [Online]. Available: https://www.who.int/.
[2] J. Striegl, M. Gotthardt, C. Loitsch and G. Weber, “Investigating the usability of voice-assistant based CBT for age related depression,” Lecture notes in computer science, 2022. [Online]. [Accessed 8 December 2023].
[3] “Woebot Health,” [Online]. Available: https://woebothealth.com/. [Accessed 8 December 2023].
[4] “Artificial Intelligence for Mental Health Care,” Youper, [Online]. Available: https://www.youper.ai/. [Accessed 8 December 2023].
[5] C. Soto and J. Jackson, “Psychology,” Five Factor Model of Personality, 2013.
[6] T. B., “Big Five Personality Test,” Kaggle, [Online]. Available: https://www.kaggle.com/datasets/tunguz/big-five-personality-test. [Accessed 8 December 2023].
[7] “Kaggle,” [Online]. Available: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio. [Accessed 8 December 2023].
[8] “Kaggle,” [Online]. Available: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess. [Accessed 8 December 2023].
[9] “Facial Action Coding System,” Paul Ekman Group, 2020. [Online]. Available: https://www.paulekman.com/facial-action-coding-system/.
[10] “TadasBaltrusaitis/OpenFace: OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.,” Github, [Online]. Available: https://github.com/TadasBaltrusaitis/OpenFace.
[11] “Feat: Python Facial Expression Analysis Toolbox,” Py, [Online]. Available: https://py-feat.org/pages/intro.html. [Accessed 8 December 2023].
[12] “Emotions dataset for NLP,” Kaggle, 2020. [Online]. Available: https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp?resource=download. [Accessed 8 December 2023].
[13] A. K, “Catharsis: A literature review,” Journal of psychiatric and mental health nursing., [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/7655913/. [Accessed 8 December 2023].
[14] “Mental Health Conversational Data,” Kaggle, [Online]. Available: https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data. [Accessed 8 December 2023].
[15] nbertagnolli/counsel-chat · Datasets at Hugging Face, [Online]. Available: https://huggingface.co/datasets/nbertagnolli/counsel-chat. [Accessed 8 December 2023].
[16] “Oculus Lipsync for Unity Development: Unity | Oculus Developers,” [Online]. Available: https://developer.oculus.com/documentation/unity/audio-ovrlipsync-unity/. [Accessed 8 December 2023].