Audio Processing

This site is to include instructions for general audio data processing

ffmpeg is a package that can handle the conversion of audio files from one to another format. The details of the package is at: https://www.ffmpeg.org/ffmpeg.html#Synopsis

To covert all files in a folder, the following command can be used (linux/Mac)

for i in *.opus; do ffmpeg -i "$i" "${i%.*}.mp3"; done

Audacity is an open source tool for editing audios: https://www.audacityteam.org/

librosa, PyAudio

Voice activity detection (VAD): detect when the voice/speech happens. For tools, silero: https://thegradient.pub/one-voice-detector-to-rule-them-all/ and webrtcvad: https://github.com/wiseman/py-webrtcvad/

MFCC:

Whisper can be dowloaded from here: https://github.com/openai/whisper

You can install it via pip. Once installed,

import whisper, librosa

whisper_model=whisper.load_model('small.en') # base or small.en works pretty good.

method 1: working with mp3/wav files

result = whisper_model.transcribe('my_audio.wav',word_timestamps=True)

pd.DataFrame(result['segments'][0].get('words')) # display the timestamp of each words

result['text'] # display the transcribed text

Method 2: Directly convert to numpy array from st.audio_input

audio_data = st.audio_input()

if audio_data:

audio_arry,_ = librosa.load(audio_data,sr=16000)

result = whisper_model.transcribe(audio_array, word_timestamps=True)

Google Sites

Report abuse