Contact me at the buttons in the footer!
These notes are written to describe at basic, how to process audio files using R, and an approach to predictive modeling using audio files.
By the end of 2022, hundreds of applications of machine learning on audio files has been completed. Some basic examples include digital tools such as Amazon 'Alexa', Googles 'Ok Google' search tool, and Apples 'Siri'.
The common application of these tools is to search for information using an input from human voice. Then the tool can search on this input and provide additional value.
To begin, one must define what is audio.
A simple google search provides the definitions of audio:
sound, especially when recorded, transmitted, or reproduced. Alternatively, sounds is defined as
vibrations that travel through the air or another medium and can be heard when they reach a persons ear.
continuous and regular vibrations.
These vibrations are measurable, and that is due to a change in pressure for the air molecules created in the vibration.
You have heard of sound waves or audio waves. This is exactly that concept. A wave is essentially how sound proliferates. This wave is mechanical and can be visualized. The sound wave moves through a particle of space.
<Insert visualization of wavelength here>
There are two ways to describe sound waves:
The size of each wavelength.
The number of waves occurring in a time period (usually in seconds).
Frequency - Number of waves per second
Wavelength - Horizontal distance between any two successive high or low points on a wave.
Period - time required for a complete cycle of a single wavelength.
Frequency of Sound = Velocity of Propagation / Wavelength
Amplitude - height of the wave from the highest point to the lowest point. Vertical distance. This establishes loudness and softness. The higher the amplitude, the louder and more energy produced.
Pitch - number of wavelengths fitting into a time period.
Formant - the peaks observed in the sound
Bandwidth - the range of frequencies
Sampling Rate - number of samples per second in a single sound. Measured in hertz
Lets dive into using R
The packages tuneR and readR are good starter packages for audio.