Charades

Voice for deaf

Abstract:

As stated by World Health Organization(WHO), it is estimated that over 466 million people around the globe that are suffering from hearing loss that is disabling. In those 466 million people some are completely deaf that basically have little or no functional sense of hearing at all and the other are called “hard of hearing” who have mild-to-moderate hearing loss. Deaf people practice sign language to communicate whereas the hard of hearing can use sign language and spoken language with aid too. Deaf communicate in sign language as their first language. Abled people perhaps have little or no knowledge of sign language at all. Deaf people have difficulty in correspondence on a daily basis. A sign language interpreter can be used. But a sign language interpreter is expensive to have accessible all the time, and it becomes inconvenient too. A solution to this problem is needed, that is economical and easy to use. Hence, our final year project proposes a solution to this problem that is easy to use and have too. The solution wished-for being an android application works both ways. First as a listener i.e. it detects the signs performed by the deaf person using Smart phone/tablets camera or a device such as Kinect and translate them to native spoken language in form of text on an android tablet/Smart Phones screen. Secondly as a speaker i.e. it takes the text in natively spoken language typed by the abled person through the keyboard and converts it into a series of signs which are implemented on an avatar on the screen of the Smart phone/ tablet.

Introduction

Deaf people all around the world are facing issues in daily life due to this communication gap. Due to slight disability, many people are being deprived of experiencing their life to its maximum. This disability has isolated the deaf community around the globe. It’s their right to enjoy every aspect of life like other abled people.

Our project is the answer to overcome this communication barrier. This project is voice for deaf. The application was made using Android Studio. The project has three modules:

In the first module, the application does conversion from sign language to native text language. Smartphone camera is used to detect the deaf person's action. The actions are actually the sign language. The detected signs are interpreted and converted to text version of native language.

In the second module, the input is in form of text typed using phone’s keyboard. The text is converted into corresponding sign language. This conversion is then performed by an Avatar/Figure on the tablet/Smart phone screen.

The proposed system provides a direct communication link for deaf and other people. this helps by making the communication in sign language easier even for a person who does not know the sign language. The idea is to make the application portable and easy to use so more people feel comfortable in using it. People would not need any extensive training to use the application.

Objectives

The project includes development of software that acts as a voice for deaf community who face many challenges in communicating daily. The objectives of the application are:

· Using Smartphone Camera to detect the signs performed by the deaf person.

· Translating those signs into text in the English language for the abled person on the Smartphone.

· The abled person typing an answer to the former query on the phone’s keyboard.

· The texted answer is then translated into sign language.

· The series of sign gestures, motions are displayed on the screen.

Keeping in mind that every human being in all normal cases has the same hand shapes consisting of 4 fingers and a thumb. This project aims at creating a real-time system that recognizes the meaningful shapes made by using hands.

Methodology:

1.1.1 Sign-to-text:

For module from Sign-to-Text, the datasets are hard to find. We have found a dataset ASLLVD dataset. It has 9734 sign videos and around 2745 different word categories. Downloaded from the following link.

https://drive.google.com/file/d/13Zia41BJn2ZBiduRyZxvQ1KvRdaZ8hjO/view

For module from Sign-to-Text, a spatio-temporal graph convolutional network has been trained. It takes a video and identifies the action using open-pose. Open-pose extracts a total of 27 key points from both hands , arms and shoulders.

1.1.2 Text-to-Sign:

For module from Text-to-Sign, the dataset used is ASLG-PC12 . This dataset was obtained from this GitHub project. This dataset contains 83916 sentences of Parallel corpora for training and 2046 sentences for validation.

https://github.com/imatge-upc/speech2signs-2017-nmt/tree/master/ASLG-PC12

OpenNMT-py library was used to train a transformer model.

A video dataset was obtained from :

https://www.cin.ufpe.br/~cca5/asllvd-skeleton/index.html

Then an android application was developed. The application takes the English sentence and using Neural machine translation model translates it to ASL gloss. The output string is tokenized. The token are used to get the corresponding video clip's ID from a lookup table. The lookup table contains the words and their corresponding Video ID's. The video ID's are then used to access the corresponding videos that are merged together using moviepy library. This whole part is done using Java and android studio and python. The video dataset and model are placed on flask server.The video dataset contains a total of 9743 videos of signs. The video dataset was obtained in .mov format which is incompatible to android. So the video dataset was converted to .mp4 format and the speed of the videos was also slowed down using FFMPEG library.