Spring 2022

Title: Artificial Gym Coach

Team members:

Ahmed Ibrahim Sharshar
Ahmed Fayez El-Gharib
Ahmed Hesham Abu-Etta

Institute:

Egypt-Japan University of Science and Technology

Abstract

A while ago, people started relying on AI-powered apps as an alternative to what they rely on others. Due to the Corona epidemic, this dependence has increased significantly during the quarantine period. Among these changes is people's reliance on exercising at home to maintain their health while closing gyms, playgrounds, etc. But these exercises can be dangerous if they are performed incorrectly, so there must be an intelligent trainer on the phone who monitors the performance of the exercises and evaluates them, and gives some suggestions to improve performance based on the proper standards for each exercise. We divided the project into two parts. In the last semester, we made the proof of concept by collecting a dataset for the squat exercise. The same actors and activities were captured using various hardware systems in two capture rounds, including video using mobile cameras and inertial measurement units (IMUs). The data from the IMUs and the videos are synced. 24 males and 3 females are performing squats. Then we made some analysis on it to show its quality. We also conducted a preliminary experiment to classify people according to their ability to perform the exercises, and this gave good results that reached around 80%. This term, we built the project. To do this, we compiled a larger Dataset; the dataset contains a variety of data forms from different sensors, including RGB videos, inertial motion data, depth, and thermal data synchronized with regards to the activities performed. The dataset consists of performing 4 workout exercises: free squats, shoulder press, push-ups, and lunges. The exercises are performed by 50 participants indoors at a fitness center. Professional trainers label the data with the type of activity and the participant's mistakes. We used IMU, videos, and joints from Kinect pose estimator Severally to build a different model. First, we built classification models for activity recognition, and it performed well with an accuracy of around 95%. Then we built regression models using videos and Kinect for assessment to give the performer a score out of 10, and it reached outstanding performance with an error of around ±1.5 out of 10. Then we built other models to assess each critical exercise factor; it did very well with an error of around ±1.2 out of 10 for each factor. We used these models to make a simple mobile application to capture video, evaluate each workout factor, and give the overall score.

Title: Egyptian Arabic Text To Speech Synthesis with Emotions

Team members:

Ahmed Nabil Ahmed
Hisham Mohamed Madcor
Magy Gamal Matta

Institute:

Egypt-Japan University of Science and Technology

Abstract

We successfully managed to achieve the first ever Egyptian Text To Speech synthesizer with a state of the art performance. In addition we managed to add layer of emotion in the synthesized text. Our contributions are:

TTS multi speaker and angry emotion corpus .
We achieved state of the art performance in synthesizing Egyptian Arabic dialect Text.
We made 3000+ lexicon dictionary with their corresponding SAMPA phonetic transcription.
Trained the first MFA Text to speech Alignment model on Egyptian Arabic.
We managed to produce TextGrids for the male version of the dataset.

In this project we managed to experiment approaches that will not work in synthesising the Egyptian Arabic. In addition, from our experiments in the signal processing approach to add emotions to the synthesized text, we found that it needs a lot of inspection, and the results will differ from sentence to another and also from one speaker to another. During our experiments we reached to a conclusion that the Egyptian Arabic and generally, the Arabic language, needs a lot of research in the NLP and speech synthesis applications. We hope that this thesis will help the research community to know more about the dialect and how to deal with it.
In the future work, we will use the Tacotron 2 architecture in synthesising and we will experiment on two different vocoders: WaveNet and WaveGlow. In addition, we would continue our research on how we could automatically phonetically transcript Egyptian Arabic with high accuracy. We would ask help form the Egyptian Linguistics specialists to try to come up with methods and rules at the language level. As this is the main core of the Egyptian TTS system. Afterwords, we will apply more emotions to the TTS system, by record more emotions and increase the size of each emotion in the dataset. We are going to experiment more techniques on multi-speaker pipeline like using deep-fake algorithms on the synthesized speech, also, we are going to introduce GANs modules to experiment the quality and the naturalness of the synthesized speech.

Stages of speech synthesis.

Egyptian Arabic Consonants and their corresponding Arabic SAMPA phonemes.

Egyptian Arabic Vowels and their corresponding Arabic SAMPA phonemes.

Egyptian Arabic Vowels and their corresponding Arabic BuckWalter transliteration.

Evolution of the learning process in male-neutral model from step 3K to step 202K.

Evolution of the learning process in female-neutral model from step 3K to step 202K.

Evaluation sentences for Neutral and Angry models.

The expert who evaluated our results.

Mean Opinion Score and Mean Square Error for our models.

Difference between Spectrograms of the synthesized sentence in BuckWlater Transliteration and Phonetic transcription in male and female neutral model.

Evolution of the learning process in male-angry model from step 2K to step 286K.

Title: Emotion Evolution from Arabic Tweets

Team members:

Sayed Omar Sayed Reyad

Institute:

Egypt-Japan University of Science and Technology

Abstract

Social media plays an important role in the development of national identity in any country, especially in the Arab world; social media revolution helps to reawaken the consciousness of the Arab region. The massive wave of protests that swept in the Middle East in early 2011 emphasized the role of the latest information and communication technology and digital social media tools and networks. These technologies have had a huge impact worldwide. "Arab Spring" can refer to "Twitter Revolution" or "Facebook Revolution" or "Internet Revolution", etc. because they play a vital role in the communication between people. This project will investigate and analyze the emotions evolution from Arabic tweets during two important Egyptian events which are Jan 25th and June 30th revolutions using emotion classification to extract emotions from Arabic tweets.

Emotion classification in Arabic text is an emerging research area to narrow the communication gap between the highly emotional human and the emotionally challenged computer by developing computational systems that recognize the effective state of the user, it can also be useful for automatically analyzing the massive amount of user-generated data on social media or blogs. Most approaches of emotion classifications from Arabic texts mainly classify emotions into single emotion (single-label emotion classification), without taking into consideration the existence of other emotions (multi-label emotion classification).

Multi-label emotion classification (MEC) captures all existing emotions from the text which best reflects the author's emotional state. The lack of MEC studies in Arabic is due to the lack of multi-label emotion Arabic datasets. Therefore, in this project we are tackling MEC in Arabic using lexicon-based techniques in which we use of two different Arabic emotion lexicons and an emoji-lexicon, then evaluating our technique on the available multi-label dataset SemEval-2018 task1 subtask5. We apply our lexicon-based technique to the two tweet-based datasets collected during two Egyptian revolutions which are Jan 25th 2011 and June 30th 2013 to extract the emotions from those tweets to investigate the emotion evolution during these events.