GenSEC Challenge at IEEE SLT 2024

Text-based Generative Speech Error Correction with LLMs


 

What is Generative Speech Error Correction (GenSEC)?

GenSEC Task 1 Description

LLM for Post-ASR Correction

This task focuses on mapping from n-best Hypotheses to ground truth speech Transcription (H2T). The training set includes AM scores from different pre-trained end-to-end ASR models and n-best hypotheses. The participants are allowed to use embedding from the first-pass acoustic or speech model to make the second pass model become multi-modal for hypotheses reranking or direct ground truth mapping. This challenge aims to open a connection to second-pass large language model (LLM) based rewriting for the speech community.  


Dataset


GenSEC Task 2 Description

Post-ASR Speaker Tagging Correction

Task 2 Baseline:
Baseline, Rules and Submission Guideline: https://github.com/tango4j/llm_speaker_tagging


Dataset

Access Task-2 Dataset on HuggingFace: 

https://huggingface.co/datasets/GenSEC-LLM/SLT-Task2-Post-ASR-Speaker-Tagging

Task 2 track only provides development set and evaluation set.


The transcripts of multiple multi-speaker datasets are anonymized and altered to construct the dev and eval set of Track-2.


GenSEC Task 3 Description 

You can use additional datasets to train your model, but they must NOT include IEMOCAP (except for the portion we provide to you). This is because we use part of IEMOCAP as our evaluation data. If additional datasets are used, you need to clearly mention the datasets used in the paper.

System Paper Deadline in the official IEEE SLT proceeding


Paper Submission (System and Method)

June 20, 2024
(CMT Link)

Paper Update (PDF revision)

June 27, 2024


Paper Notification (Potential Revision for Evaluation )

August 30, 2024

Organizing Chair and Committee 



Technical Committee  


Related References



Speech Emotion Recognition with ASR Transcripts: Investigating the Impact of Word Error Rate and Fusion Techniques, coming soon

Task 1. ASR-LM Correction

Multi-task LM for post-ASR and post-Translation correction

Task 2. Speaker Tagging Correction

Post-ASR Speaker Tagging Correction

Task 3. ASR-LLM SER

Post-ASR LLM-Based Speech Emotion Recognition

Come to Join US in SLT 2024