3Deoskill's LipTalkAudioConverter is a free software written in Python, which can convert audio speech files into special transcripts which you need for the LipTalk plugin when using the Audio-Mode.
Since the last update of the AudioConverter it can also convert mp3 files.
The new compression mode allows you to export the transcripts as space-saving gzip files. This is useful, for example, when converting in advanced mode, where the amplitudes are also written into the transcript. The file becomes significantly smaller. Otherwise, standard .json transcripts are exported.
Liptalk will automatically detect the type in the upcoming 2026.0 update.
These files can be loaded into the plugin in Cinema 4D to create the lip syncs when you use the Audio Mode of LipTalk
The audio converter supports up to 2500 languages.
I am planing some extra features for the AudioConverter: Voice Training. This way you can train and improve speech recognition specifically for your voice.
The LipTalkAudioConverter was written in Python and is licensed under the GNU General Public License 3.0.
wave files (Mono and Stereo)
mp3 files (Mono and Stereo)
The Liptalk Converter 2026 Final package includes the standalone Windows application and the full source code for macOS and Linux users. Download and unzip.
Action: Open the folder liptalk_converter_2026_Final and run liptalk_converter.exe.
Note: No Python installation or setup required.
To run the converter from source, use the files in the liptalk_converter_resource_files folder.
Full Guide: For step-by-step terminal commands and environment setup, please refer to the: 👉 View Installation Guide (HTML) or the readme_installation.html in the package
Prerequisite: The AudioConverter was developed with Python 3.11.5. Install Python 3.11.5 (Download here).
Setup: Create a virtual environment and follow the instructions
Link Online Help
Language
Phonem Emitter Strength
Advanced Mode (for speech files with longer than average vowels)
Compressed allows you to convert into space-saving .gzip files which can now loaded into LipTalk.
Shows you the path to the audio file
Load an audio file (wave-file, mono or stereo)
Shows you the path to the destination file
Select where you want to save the generated transcript file
Convert button to convert the audio file
Status / progress bar
A link to the online help
From this menu you can choose the language of the audio file.
International is the standard setting and can convert up to 2500 languages.
You can also find a small selection of languages in the menu that offer special phonetic features to improve the accuracy of speech intelligibility. The best thing to do here is to test which setting gives you the best results.
The Phoneme Emitter value is an important parameter for generating phonemes from speech signals. This parameter sets the sensitivity for detecting phonemes..
The default value is 1.0. If you are not satisfied with the phoneme recognition, you can simply increase the value. However, if too high a value has negative effects, for example if too many phonemes result in unnatural or irregular or shaky animations of the lips, you should reduce the value. Many tests have shown that values below 0.7 are not recommended. The standard value is actually a good base value.
The Advanced Mode checkbox is an option you can use when working with speech files that contain long vowels. It is an internal algorithm in the LipTalk Plugin called the "Frame-Range-Method" which was developed by 3Deoskill. In Normal Mode, the lips are immediately closed after a phoneme is detected using the Release parameter. In Advanced Mode, the lips remain in this position as long as the internal algorithm still associates the amplitude with the phoneme. If the signal falls below a certain level, the Release parameter kicks in and closes the lips. The Threshold parameter can be used to influence this threshold value, similar to a noise gate. The parameter works dynamically and automatically adjusts to the detected original level. Internally, the audio signal is of course smoothed to avoid irregularities. If you enable this option, the amplitude values of the audio file will be included in the transcript file, resulting in a larger file size. For example, a 10-second audio file can be 20 MB in size.
But this enables you to switch between Normal and Advanced Mode in the LipTalk Plugin. If you do not convert in Advanced Mode, only Normal Mode is available.
Liptalk can now also read gzip files, which are extremely compressed archives. The converter when compressed is checked can now also export this file type
Here you can see the path to the audio file
To convert an audio Wave-File, you can select a file here. The file can be stereo or mono, but the quality should be high. If the level is too low, the noise will increase. In addition, there should be no disturbing background noise that affects the pronunciation of the phonemes. The audio signal should be normalized, but this is not mandatory as the AudioConverter adjusts the amplitudes internally.
Here you can see the path to the destination file (transcript file .json/.gzip
If you already have an audio file selected, the program will automatically create a transcript file with the same name and location as the audio file. But you can also browse your computer and define a custom name for the transcript file.
To begin the conversion process, you need to click on the Convert button. Make sure you have an active internet connection, as the program will download a pretrained model from a remote server if you convert for the first time.
The program displays a progress bar while it performs the conversion. After completing the calculation, it indicates the status Finished.
Tip: To speed up the conversion process, keep the converter running. The pretrained model is already downloaded, so it does not need to do it again. This is very comfortable if you want to create more audio files.