Neural Network Classifier For Emergency Siren Sounds:
I coded this fully-connected neural network classifier without the aid of any deep learning libraries. I based my code off of Michael Nielsen's implementations in his wonderful book "Neural Networks and Deep Learning, " available for free online. The network classifies incoming data into one of the following labels:
Ambulance [0]
Firetruck [1]
Traffic [2]
I reached an accuracy of 96.67% on test data (unseen in training).
To implement this classifier, I came up with original algorithms for data preprocessing, data augmentation (I coded my own audio plugins to generate transformations on the vanilla dataset), and acoustic feature extractions. All relevant documentation can be found in my GitHub repository (embedded in the icon in this section). I also added several optimization features to my neural network, such as L2-Regularization, Varying Learning Schedule, flexible layer-sizing, among others.
All relevant documentation and code available here:
Machine Learning: KNN Algorithm For Vowel Classification from WAV files
This is a Classifier capable of identifying five vowel sounds:
AAA (as in father)
EEE (as in bet)
III (as in wind)
OOO (as in lot)
UUU (as in mood)
The classifier was only trained with my voice, but it is still capable of predicting vowel sounds from other people's voices with roughly 90% accuracy. This version only considers one feature: the energy band ratios. Different vowels have different frequency profiles, even when the note produced is the same. This classifier leverages this fact by computing the ratio of 14 energy bands in the frequency domain to the first energy band (0 - 1000 Hz) , which contains the fundamental frequency. All files are normalized before processing to ensure a useful comparison.
Please feel free to run the code on Google Colab:
Coded functions that perform Fourier Transforms on vectors in C++ without the aid of libraries. Built a Direct DFT function for computation time reference, and the Cooley–Tukey FFT recursive algorithm. The tests show the computation time difference and have helper functions for visualizing data. Moreover, there is also a zero padding FFT function that takes a vector of any length, pads it with zeros at the end, and applies the Cooley-Tukey FFT algorithm.
Above: The spectrogram of a voice singing a major scale, up and down.
Below: Time-domain view of PSOLA
Autotune in MATLAB
This was the final project of an introductory DSP course. Our team chose to built an autotune plugin in MATLAB. This plugin analyses monophonic audio samples, detects the main pitches, and corrects them using PSOLA (Pitch Synchronous Overlap and Add).
All relevant documentation can be found HERE.
Pong In Processing
This game was a project for a class at the University of Michigan, PAT 204 - Creative Coding. The visual aspects were programmed in Java in Processing, while the audio effects were coded in MAX/MSP. It is a slightly more sophisticated version of the classic game Pong.
Piano Teacher: Processing and Max/MSP
This game is also a PAT 204 project. The visual aspects were also programmed in Java in Processing and the audio effects were coded in MAX/MSP. This app is a simple piano teacher that teaches a few songs and scales. It is designed to have the user play on a MIDI controller that MAX converts to audio with synthesizers.