EE476 ıntroductıon to speech and language processıng

FEATURED PROJECTS

EARWIG

Team: Sefa Alp and Muhammed Atakan Pehlivanoğlu

Year: Spring 2020

Description: We make important meetings with our boss, teachers, or partners occasionally. These meetings can happen for solving certain problems or drawing a new roadmap for the project that we are working on. But important details of these meetings are forgotten in several days and the working process can be staggered due to missing information. In this project, we focused on this problem and trying to find out an easier and effective method than taking notes. Earwig occurred in this manner. In its simplest form Earwig first takes speaker's voice records separately for creating training. Using this voice records program to train itself for each speaker. Then we record dialogue or meeting to the program. And output program gives the sentences as voice record that each person in the speech used according to their speaking order.

More Information: Project report

TOOLS FOR ANALYZING SPEECH SIGNALS

Team: Göktuğ Yıldırım

Year: Spring 2020

Description: The goal of this project is to develop a GUI (Graphical User Interface) that can analyze the given speech signals using digital signal processing techniques. Signal processing is one of the most important topics in systems engineering and electronics engineering. In general, analysis on analog and digital signals can be defined as application to various systems by detecting temporal and spatial changes. In relation to that, speech processing is one of the subtitles of the signal processing. Although MATLAB is one of the most popular signal analyzing tools available today and it provides a detailed environment for general signal processing and analyzing tasks, it does not have a user-friendly interface for sound analyzing because MATLAB is not just a customized tool for speech processing. On the other hand, WaveSurfer is a widely used audio editor for acoustic-phonetic studies, but according to my observations, it has not appropriated enough user-friendly interface for basic speech analyzing tasks. It has lots of settings properties. So, a large amount of settings may be overwhelming to a novice user. In this project, loudness, pitch, formant frequencies were used as acoustic features to develop this tool. Thanks to this developed tool, novice users will be able to analyze the speech signals by observing waveform, Fast Fourier Transform (FFT), Short-Time Log Energy, Short-Time Zero Crossing, Short-Time Fourier Transform, Short-Time Real Cepstrum, spectrogram, formant frequencies, pitch changing easily. As a result of this project, the developed tool will provide an efficient speech analysis environment for the novice users especially undergraduate students in electrical and electronics engineering.

More Information: Project report

Turkish Phoneme Classification with Mel Frequency Cepstral Coefficients using Artificial Neural Networks

Team: Göktuğ Kayacan, Remzi Orak and Sefa Alp

Year: Spring 2019

Description: This project’s main goal is to implement a phoneme recognition system using Mel Frequency Coefficients (MFCC) to distinguish between different phoneme using a parametric classification model such as a feedforward neural network or a recurrent neural network. The system should be able to produce outputs for a given set of MFCC features of a frame with the size of the frames that are used to train the model. The result from the network should be a list of label possibilities for the given features.

More Information: Project report

SPEECH, SILENCE AND MUSIC DISCRIMINATION

Team: Sena Koyuncu, Mustafa Can Gülbaş and Mehmet Taylan Eğer

Year: Spring 2019

Description: In multimedia applications, discrimination of silence, music, and speech plays an important role. This project discriminates the regions that contain silence, music and speech in a given audio file. Project is implemented on MATLAB using temporal features such as short-term energy and log energy, which are simpler to obtain compared to spectral features. Firstly, silent regions are detected using an end point detection algorithm. Then examining the log energy; it is seen that the audio signal continues above a certain threshold in regions contain music, but regions with speech does not have such regularity due to breathing pauses. Using this finding, speech and music regions are discriminated as well and region classifications are shown on a graph.

More Information: Project report

Report abuse