This is a site detailing the progress of our Project for Pattern Recognition Class for EEL6865 class. This site will detail our progress of the projects and our results.
The sign language is a form of communication using hands, limbs, head as well as facial expression which is used in a visual and spatial context to communicate without sound. There are many variants of the sign language used by mainly hearing and speaking impaired individuals but main among them are the American Sign Language, the British Sign Language as well as the Australian Sign Language (AusLan).
While the exact number of speakers using the American Sign Language has been estimated to be anywhere from 5 hundred thousand to 2.5 Million, this is still negligible compared to the estimated population of USA which is 312 Million. It is this imbalance we seek to address using technology primarily via the means of an Automatic American Sign Language system.
Since the American Sign Language has a vast database with a complex system of facial expression and gestures for each word, we intend to initially work with individual letters for forming words, a system called Fingerspelling. As a proof of concept, we will work only with a few letters initially. We propose to deal with this application in stages
The algorithms we have chosen for implementation are Principal Component Analysis(PCA) and Linear Discriminant Analysis(LDA). The basic idea behind the PCA is dimensionality reduction transforming the images to a subspace which is uncorrelated. Whereas Linear Discriminant works to minimize the In-class scatter while maximizing the cross-class distance. We have used a skin color detection for preprocessing of the images. The images for training as well as testing were got from the Marcel database. We have used a variety of classification schemes like k-Nearest Neighbors, Support vector Machines and Neural Networks for testing which feature extraction scheme is better.
While speech recognition is inherently more complicated due to accent, tonal and intensity variations, since our system uses only a few alphabets initially, we are motivated to carry this out. The algorithms we use is the Mel Frequency Cepstral Coefficient (MFCC) and Linear Predictive Coding(LPC). MFCC works by feature extraction based on the cosine transform of a short time power-spectrum of the sound called cepstrum. The coefficients are calculated on a non-linear Mel scale which corresponds to the human auditory system response. LPC works by modelling the human vocal cord at the voice production as a filter and find out the coefficients of the filter corresponding to the sound. Both these schemes work to identify the peak frequencies called formants. We use Gaussian Mixture Modeling and Neural Networks for classification.
More details about our algorithms are given in here.
The results we obtained are tabulated and analyzed in detail here.
The codes we used as well details about the tools we used are provided here.
Our findings for the project as well as future plans are outlined here.