Red Hen Audio Tagger

Red Hen Audio Tagger (RHAT) by Sabyasachi Ghosal, Austin Bennett, & Mark Turner


The Red Hen Audio Tagger (RHAT) is a novel, publicly available open-source platform developed by Red Hen Lab. RHAT employs deep learning models to tag audio elements frame by frame, generating metadata tags that can be utilized in various data formats for analysis. RHAT seamlessly integrates with widely used linguistic research tools like ELAN: the researcher can use RHAT to tag audio content automatically and display those tags alongside other ELAN annotation tiers. RHAT additionally complements existing Red Hen pipelines devoted to natural language processing, speech-to-text processing, body pose analysis, optical character recognition, named entity recognition, computer vision, semantic frame recognition, and so on. These cooperating Red Hen pipelines are research tools to advance the science of multimodal communication.

RHAT tags streams of audio data via a deep learning model. Since a single stream of data can contain multiple sound effects, audio tagging is a multi-label classification problem. RHAT is a pipeline; it automatically pre-processes each audiovisual file in the input list, models it, and generates a file of tags suitable for ingestion to an annotation application, with timestamps and confidence ratings for each tag. The pipeline can be modified by swapping the model used. Which model is deemed best will depend upon the nature of the research project. RHAT accordingly treats the model as a modular plug-in component.RHAT's tags are generated from frames, using existing pre-trained deep learning models (like YamNet). The tags are stored in two kinds of files that differ not in data but in metadata format: 

Source Code

RHAT was originally created as a project in Red Hen Lab Google Summer of Code 2022. The current version can be found here.

Code Book

The current pipeline only leverages the tags provided by YaMNet. The codebook contains these list of sound effects (521 classes) which are currently tagged by the model. The details of the Codebook can be found here

Published Paper

RHAT was published by Linguistic Vanguard, where you can read more information (https://doi.org/10.1515/lingvan-2022-0130 )

Open Issues/Future Work