Speaker and Gender Recognition systems using speech processing

Abstract

Most voice-recognition systems are classified as isolated or continuous. Isolated word recognition requires a brief pause between each spoken word, whereas continuous voice recognition does not. Speech-recognition systems can be further classified as speaker dependent or speaker-independent. A speaker-dependent system only recognises voice from one particular speaker's voice, whereas a speaker independent system can recognise voice from anybody. There are two major applications of speaker recognition technologies and methodologies. If the speaker claims to be of a certain identity and the voice is used to verify this claim, this is called verification or authentication. On the other hand, identification is the task of determining an unknown speaker's identity. In a sense speaker verification is 1:1 match where one speaker's voice is matched to one template (also called a "voice print" or "voice model") whereas speaker identification is a 1: N match where the voice is compared against N templates. From a security perspective, identification is different from verification. For example, presenting your passport at border control is a verification process: the agent compares your face to the picture in the document. Conversely, a police officer comparing a sketch of an assailant against a database of previously documented criminals to and the closest match(es) is an identification process. The main aim of this project is to discriminate the noise from the information signal and to utilise this information signal to do useful processes, in this project we have used MATLAB to code a program which is further used to differentiate between male and female voice (gender recognition using voice). Two main algorithms have been used to code and differentiate the different voices of the two different genders. Correlation and MFCC algorithms have been used to sample the input user voice and then process the voice to create a unique index to a specific user based on the properties of their sampled voice files and then the index is used to differentiate between the two users. The whole process of recognition happens in two stages, in the first stage the user input is stored in the respective MATLAB directory of the recognition folder, the voice samples should be recorded in a recorder in the system and stored as .WAV files. these files are then accessed by the MATLAB for further processing of the voice signal.

Sequence diagram

Sequences in the diagram of the program is as given below. There is one input sequence and three output sequences. The communication between software and database is done in 3 sequences.

SIMULINK Model

A simulink model was also developed along with the Matlab code. The simulink model works the same way as that of Matlab code, it recognises voice in the database and further classifies the voice as male or female. The simulink module the final display's 0 or 1 as the output, 0 corresponds to male voice and 1 corresponds to female voice.

The project is finished and the outputs was verified with a high accuracy. In case of any doubt you can contact my email directly or you can check my github for the code.

Github link