Home

This is a site detailing the progress of our Project for Pattern Recognition Class for EEL6865 class. This site will detail our progress of the projects and our results.

 

Motivation:

The sign language is a form of communication using hands, limbs, head as well as facial expression which is used in a visual and spatial context to communicate without sound. There are many variants of the sign language used by mainly hearing and speaking impaired individuals but main among them are the American Sign Language, the British Sign Language as well as the Australian Sign Language (AusLan).

While the exact number of speakers using the American Sign Language has been estimated to be anywhere from 5 hundred thousand to 2.5 Million, this is still negligible compared to the estimated population of USA which is 312 Million. It is this imbalance we seek to address using technology primarily via the means of an Automatic American Sign Language system.


Proposed Application

Since the American Sign Language has a vast database with a complex system of facial expression and gestures for each word, we intend to initially work with individual letters for forming words, a system called Fingerspelling. As a proof of concept, we will work only with a few letters initially. We propose to deal with this application in stages

  • Stage 1- This will deal with a primarily offline image based recognition system. The system will be trained using images from a American Sign Language database training set and the efficiency of this classification will be tested using the testing database set.
  • Stage 2- We will then propose to modify the system to work with a input via a webcam. This will test the robustness of our system against different background, size and angle variation.
  • Stage 3- We will modify this system to work with speech as a reverse classification where the speech input will give the corresponding letter as the output.

Algorithm

The algorithms we have chosen for implementation are Principal Component Analysis(PCA) and Linear Discriminant Analysis(LDA). The basic idea behind the PCA is dimensionality reduction transforming the images to a subspace which is uncorrelated. Whereas Linear Discriminant works to minimize the In-class scatter while maximizing the cross-class distance. We have used a skin color detection for preprocessing of the images. The images for training as well as testing were got from the Marcel database. We have used a variety of classification schemes like k-Nearest Neighbors, Support vector Machines and Neural Networks for testing which feature extraction scheme is better. 


While speech recognition is inherently more complicated due to accent, tonal and intensity variations, since our system uses only a few alphabets initially, we are motivated to carry this out. The algorithms we use is the Mel Frequency Cepstral Coefficient (MFCC) and Linear Predictive Coding(LPC). MFCC works by feature extraction based on the cosine transform of a short time power-spectrum of the sound called cepstrum. The coefficients are calculated on a non-linear Mel scale which corresponds to the human auditory system response. LPC works by modelling the human vocal cord at the voice production as a filter and find out the coefficients of the filter corresponding to the sound. Both these schemes work to identify the peak frequencies called formants. We use Gaussian Mixture Modeling and Neural Networks for classification. 


More details about our algorithms are given in here.


Results

The results we obtained are tabulated and analyzed in detail here


Tools

The codes we used as well details about the tools we used are provided here.


Conclusions

Our findings for the project as well as future plans are outlined here.