Main Objective
The purpose of the pitch modulation function is to enable the DJ to modify the soundtrack by creating a variation in sound in the music, in real time. This can be done by shifting the pitch by a number of semitones or using vibrato to create a rapid change in pitch. Through pitch modulation, the music is able to effectively evoke more emotion and expression, which adds on to the ambience of a live performance. The user has the option to use any WAV file of their choice for pitch modulation.
Function Algorithm
A phase vocoder is used to manipulate phase information in order to alter the pitch.
Image 1: Phase vocoder algorithm; details are outlined below.
First, the short-time Fourier transform (STFT) is applied to the signal in order to access its frequency components. To apply the STFT, the signal is broken up into smaller segments of a specified length, called windowed segments. A windowing function, such as the rectangular or Hamming window, can be applied in order to do this. The windowed segments each have a certain overlap, called hop length (defines the length in between windows). This overlap of windowed segments is to prevent information loss between frames.
Image 2: STFT mathematical representation. While x[m] is the input signal, w[n-m] is the window function.
Image 3: A visual representation of the STFT.
The phase of the audio can then be modified by using a sinusoidal wave. The parameters are modulation frequency and modulation depth, which correspond to the wave frequency and amplitude of the sinusoidal wave, respectively. While modulation frequency introduces vibrato and controls how fast the audio wave oscillates, the modulation depth controls how much the pitch fluctuates by increasing or decreasing the maximum amplitude. This increase or decrease in amplitude is measured in semitones, which is defined as the smallest interval between notes on in a scale (also known as a half step). Every 12 semitones is one octave.
Once modulated, the signal is then reconstructed by combining the phase and magnitude with the newly modulated phase. The inverse STFT is applied to the signal to get back to the time domain, where we now have a pitch modulated audio signal.
Image 4: Code of phase vocoder function.
The length of the windowed signal as well as the hop length are hard-coded and not set by the user. The values 1024 and 256 were used for the length of the windowed signals and the hop length, respectively. A larger window size would improve frequency resolution while sacrificing time resolution. Typically, a window size of 2048 would be used for balanced time and frequency resolution; however, we use a slightly smaller window size to account for possible rapid changes in the audio signal. While a smaller hop length would improve temporal smoothness, a larger hop length would reduce overlap, which may cause "choppiness" in the signal.
Pitch Modulation Effect on Signal Waveform
Below are the signal waveforms, before and after pitch modulation. We can see that increasing the modulation frequency will create a faster oscillating signal, as shown by the rapid variation near the amplitude of the signal. We can see that increasing the modulation depth will increase the fluctuation of the magnitude of the signal, as the maximum amplitude is increased.
Image 5: Effect of adjusting the modulation frequency/vibrato rate.
Image 6: Effect of adjusting the modulation depth.
Image 7: Comparing the audio waveforms before and after pitch modulation (modulation frequency = 4 Hz, modulation depth = 6 semitones).
Demonstration
Above is a demonstration of the pitch modulation function. By adjusting each slider, the user is able to change the modulation frequency and depth. The user is also able to start and stop the pitch modulation with the buttons.
A higher modulation frequency creates the vibrato effect, in which there is a rapid change in pitch in the audio. A higher modulation depth increases how high/low the pitch can vary. The limits were chosen to be 24 semitones and 20 Hz, as a greater number of semitones and larger frequency would result in audio that is too distorted for practical purposes.
While the effect of changing the modulation depth is audible when the effect is applied alone, the effect of changing the modulation frequency must be applied with the modulation depth in order for the change in the audio to be noticeable, as shown by the demonstration.
Although the audio is modulated, there is unwanted noise and static in the audio output, which may be mainly caused by the bass in the song. Regardless, with proper filtering techniques to ensure low noise, this pitch modulation function will be effective in any DJing environment.
GUI
Sliders are used in order to adjust the modulation frequency and modulation depth, while buttons are used to start and stop the pitch modulation.
Image 10: Pitch Modulation Function GUI
DSP Tools Utilized in the Pitch Modulation Function
Short-time Fourier Transform: This is the Fourier transform performed on small segments of signal. These segments are created by a windowing function that is slid across the signal with certain overlap (called hop length). Further details on how this is applied can be found in the above section, Function Algorithm.
Results and Challenges
Overall, the pitch modulation function was successfully able to modify audio in real time. A phase vocoder was used to enable pitch modulation by manipulating phase information. By changing the modulation frequency and modulation depth, we were able to achieve a pitch modulated output using this function. However, the audio output ended up having extra noise and static, mainly due to impulses in the audio (such as bass). To ensure robust audio output, we can apply a low-pass filter to filter out the noise of the modulated output before writing it to the output stream. This would ensure that the modulated output is minimally affected and would be more promising for use in a DJing environment.
One of the biggest challenges was learning how to use the Python libraries on top of learning Python, as each library had many available functions to choose from. Another challenge was learning how to make the GUI and have real time audio manipulation, which was implemented through the use of various libraries such as tkinter and threading.
Images adapted from:
https://sethares.engr.wisc.edu/vocoders/phasevocoder.html (Image 1)
https://course.ece.cmu.edu/~ece491/lectures/L25/STFT_Notes_ADSP.pdf (Images 2 and 3)