Milestone 3

Implementation

The team has accomplished a lot since the last milestone. We have a working user interface as well as user settings. As you can see below we have included what our product looks like currently. The team has come up with a design that is simple and effective for users to be able to learn and use quickly while following a set scheme for their brand for their design. This design includes the main page as well as a settings menu which you can see below.

The team has also accomplished the ability to recognize & transcribe three signed words/phrases so far and has created a training software that allows for quick and easy training of different signed words and phrases to be able to quickly implement it into their machine learning algorithm pictured below as well.

The team has also started to create a platform in which their product can be accessed and downloaded by its customers. They decided a website download would be the best solution.

The team has also accomplished a lot on the back end of things. Not only have they found a way to incorporate the data being received from the media pipe software but they have found a way to train and find patterns in that data to be able to recognize the differences between trained signed words.

Sign language also is not the same as spoken English so there are instances where someone may sign "What your name" whereas the English transcription should be "what is your name" and the team has found a way to create a tool to be able to instantly predict the words that it thinks belongs in between the signed words in order to create a more clear and direct transcription.

Current User Interface Design:

Working Transcription:

Testing Interface:

Machine Translation:

Before model optimizations (350ms)

After model optimizations (86.6ms)

Test

The backbone for our project consists of two machine learning models: the Prediction Model and the Machine Translation Model. These two models have not been trained on a full data set; however, they have been trained with small sample sets for testing purposes.


The Prediction Model's purpose is to categorize body landmark coordinates into respective words through the use of long short-term memory (LSTM) neural network architecture. Using Tensorboard, we can visualize the loss function and categorical accuracy of our model throughout its training epochs. The graph has lots of noise due to the low amount of data and some of the mistakes that were made while creating that captured data. Even with the downfalls, the model finished with a categorical accuracy of 97.65% and a confusion matrix that showed no false positives or false negatives in the sample data.

The Machine Translation Model's purpose is to transform words and phrases of literal American Sign Language into grammatically correct English. We plotted our loss function while training to visualize how well the training was going throughout the epochs. For this model, our data was not very large but was hyper-focused on producing results that we wanted to attain, so our loss function (on the left) is very smooth besides the small amount of noise possibly generated from over-training around the 30th epoch. Nevertheless, our model performs quite well and has been optimized to operate very quickly - the testing of three translations can be seen on the right image. The commented phrases are in green and represent the expected output, the strings in red represent phrases translated literally from ASL and the output below shows the model producing three accurate translations in under 90ms.

Teamwork

The teamwork in this project thus far has been cohesive and coordinated such that each member contributes to an integral part of the project. Brianna and Chloe have contributed to frontend functionality. Brianna worked with the frontend framework and created the base frontend application. This includes all display options, as well as the video interface and the settings menu. She also worked on the overall design of the application based on the team's vision. Chloe worked on the user elements and features which comprise the functionality of the UI and overall user accessibility including implementing a low-light filter and text size accessibility functions. Chloe also worked on research and development in coordinating with research institutions to locate a training dataset and project management to ensure the team is achieving our goals and meeting our deadlines. Jayden has contributed to backend functionality including our machine learning algorithms, automating the training process, and creating an innovative and elegant website to deploy our product on. Each team member collaborated on making decisions regarding the direction of the project, the principles the project is based on, and training our machine learning algorithms. The team has a collective vision for the project that is demonstrated by our deliverables and future goals for the project.