Exploring Quality and Generalizability for Parameterized Neural Audio Effects

Companion Audio Examples and Descriptions: All files are wav files at 44.1kHz sampling rate, 16-bit depth*

Best Experienced With Headphones! Please allow a minute or two for the audio files to load

*with the exception of the acapella 16kHz sampling rate example

Effects

Compressors:

A compressor is used to compress a sound's dynamic range, making the louder parts of the audio closer to the quieter parts with respect to amplitude. Without make-up gain, this leads to overall quieter audio that is more amplitude constant from beginning to end. There are both digital and analog compressors, and the compressors used to generate these examples have four variable settings: threshold, ratio, attack time, and release time. Threshold determines at what amplitude the compressor begins compressing, meaning the lower the number the less amplitude is needed for the compressor to engage. Ratio determines how much the compressors reduces the amplitude of audio above the threshold. A ratio of 2:1 means that for every 2 decibels over the threshold the original audio would go, the compressor only allows that audio to go 1 decibel over amplitude-wise. Attack time is how quickly the compressor engages when audio crosses the threshold, and is often measured in milliseconds. Release time is how quickly the compressor disengages after the audio dips back below the threshold and is also often measured in milliseconds.

Comp4c:

Comp4c is a digital compressor effect with four "knobs" that are variable. The four knob settings are threshold, ratio, attack time, and release time. The knob ranges are: Threshold (-30 to 0), Ratio (1:1 to 5:1), Attack Time (1ms to 40ms), Release Time (1ms to 40 ms). Each audio example below using the comp4c effect was outputted from a model trained on the full range of the knobs, but the example is just output audio from a single setting. For ease of comparison each example using the comp4c effect was generated with the following settings:

Threshold = -15, Ratio = 3, Attack Time = 20.5ms, Release Time = 20.5ms

Comp_one:

Comp_one is also a digital compressor effect with four "knobs" that are not variable. The four knob settings are again threshold, ratio, attack time, and release time. The comp_one effect has locked knob ranges with the expectation that learning this single setting would be an easier task, and thus lead to higher quality results, than the full range of the comp4c effect. Each audio example below using the comp_one effect was generated with the following settings:

Threshold = -25, Ratio = 4, Attack Time = 5ms, Release Time = 20ms

Leslie Cabinet:

The Leslie speaker is a combined amplifier and loudspeaker that projects the signal from an instrument and modifies the sound by rotating a baffle chamber ("drum") in front of the loudspeakers. A similar effect is provided by a rotating system of horns in front of the treble driver. The Leslie cabinet used to generate the dataset used in these examples contained two speakers, a horn and a woofer. The horn speaker is responsible for amplifying the high frequencies, and the woofer is responsible for amplifying the low frequencies. A musician can control the rotation speed using either a pedal or an external switch that alternates between a slow and fast setting. The slow rotational setting is commonly referred to as the "chorale" or "chorus" effect, and the fast rotational setting referred to as the "tremolo" effect.

Raw Audio Samples - Inputs

baseline.wav
second_example.wav
vocal_example.wav

Example #1



Example #2



Acapella Example

Speed Results: Baseline, Frozen Layers, Removed Skip Connection

Effect and Settings

Baseline: Comp4c

Predicted



Target



Frozen Layers: Comp4c

Predicted



Target


No Skip Connections: Comp4c

Predicted



Target




Example #1


pred_baseline_1000epoch_comp4c.wav
FIXED_target_baseline_1000epoch_comp4c (1).wav


pred_baseline_frozen_comp4c.wav
FIXED_target_baseline_frozen_comp4c.wav


pred_baseline_noskip_comp4c.wav
FIXED_target_baseline_noskip_comp4c.wav

Example #2


pred_example2_comp4c.wav
FIXED_target_example2_comp4c.wav


pred_example2_frozen_comp4c.wav
FIXED_target_example2_frozen_comp4c.wav


pred_example2_noskip_comp4c.wav
FIXED_target_example2_noskip_comp4c.wav

Accuracy Results: Baseline, 10,000 Epochs, Comp_one Effect,

Effects

Baseline: Comp4c


Predicted



Target



Comp_one Effect: Comp_one

Predicted



Target



10,000 Epochs: Comp_one

Predicted



Target


Example #1


pred_baseline_1000epoch_comp4c.wav
FIXED_target_baseline_1000epoch_comp4c (1).wav


pred_baseline_comp1.wav
FIXED_target_baseline_comp1.wav


pred_baseline_10000epoch_comp1.wav
FIXED_target_baseline_10000epoch_comp1.wav

Example #2


pred_example2_comp4c.wav
FIXED_target_example2_comp4c.wav


pred_example2_comp1.wav
FIXED_target_example2_comp1.wav


pred_example2_10000epoch_comp1.wav
FIXED_target_example2_10000epoch_comp1.wav

Acapella Audio

Effect and Settings

44.1kHz Sampling Rate: Comp_one

Predicted



Target



16kHz Sampling Rate: Comp_one

Predicted



Target

Acapella Example


pred_vocal44_comp1.wav
FIXED_target_vocal44_comp1.wav


pred_vocal16_comp1.wav
FIXED_target_vocal16_comp1.wav

Dataset Manipulation: Guitar Note Trained Model - Comp_one Effect

Guitar Note

Predicted



Target


Full Instrumentation

Predicted



Target

pred_guitar_note1_new.wav
FIXED_target_guitar_note1_new.wav


pred_baseline_fullsong_guitar_model.wav
FIXED_target_baseline_fullsong_guitarmodel_comp1.wav
pred_guitar_note2_new.wav
FIXED_target_guitar_note2_new.wav

Leslie Cabinet Effects: Chorus and Tremolo

Horn Chorus

Predicted



Target



Woofer Chorus

Predicted



Target



Horn Tremolo

Predicted



Target



Woofer Tremolo

Predicted



Target

pred_LHC_note1.wav
target_LHC_note1.wav


pred_LWC_note1.wav
target_LWC_note1.wav


pred_LHT_note1.wav
target_LHT_note1.wav


pred_LWT_note1.wav
target_LWT_note1.wav
pred_LHC_note2.wav
target_LHC_note2.wav


pred_LWC_note2.wav
target_LWC_note2.wav


pred_LHT_note2.wav
target_LHT_note2.wav


pred_LWT_note2.wav
target_LWT_note2.wav