How to use Voice Recognition

CasualTranscriber 2.7 has an experimental feature that utilizes the macOS's built-in speech recognition feature for automatic transcription.

Currently, I'm still testing the functionality, but here are the known limitations and information about the feature:


This new version is compatible with macOS 11.3 Big Sur or later. However, on Big Sur and Monterey (or later), there seems to be changes in how the speech recognition feature is handled. On Monterey and later, before using the feature, you need to enable the Dictation functionality and download the language data for the desired recognition. On macOS versions earlier than Big Sur, this feature is not available, so this functionality will not be available on CasualTranscriber.

On macOS Monterey or later, please enable the Dictation in the Keyboard section of System Settings. Then, under the Language settings, choose "Customize" and download the language data for the language you want to use for recognition. The language of your locale is available by default. In the example below, English (United States) is available. The added languages will be displayed in the list by their names.

Once you have configured the Dictation settings, launch CasualTranscriber and select the Voice Recognition Language in the ADV1 section of the Preferences.

The list displays all the languages that can be recognized, but the languages that will be recognized are only the ones which you have downloaded the language data.

Now you can recognize the audio/video file on the window. Go to Menu -> Misc -> Recognize Sound in File.

The segmentation of the text corresponds to the sections recognized by macOS's speech recognition feature. Time stamps are inserted before and after each segment. However, please note that this feature is independent of the control of the audio and video files, so there may be slight discrepancies (likely in milliseconds) between the time stamps and the actual content.

With version 2.7.1 (20230617) or later, a batch processing function is available. This function allows processing multiple audio/video files and save the results as RTF file with CasualTranscriber format.

To use this feature, select "Batch Transcriber" from Main Menu -> Window. Once the window is open, drag and drop the files you want to process onto the table, select the recognition language, and click "Process." You will be prompted to choose a folder to save the files, so please select a folder accordingly.

If an error occurs, information about the error will be recorded on the file, and the processing for that file will be terminated. As a known issue, the same as the individual file dictation, on Big Sur, even if the dictation recognizes all the audio in the file, an error may occur and the process will end if there is a portion without human voices at the end or even with some human voice. So, if find this error, please verify if the last part is actually recognized.