Left: We propose Audiocards, structured metadata which describes an audio file with attributes relevant to sound designers. We prompt an LLM with the available text metadata and audio descriptors, and generate an audiocard, which can be used for text-based search and to train audio-language models.
Right: Audiocard generated by our Whisper-cards audio captioner from input audio without text metadata.