Formosa Speech Recognition Challenge 2023 - Hakka ASR
Call-for-Participants
Formosa Speech Recognition Challenge 2023 (FSR-2023) is the third event of the Formosa Speech in the Wild (FSW) project, which is organized by National Yang Ming Chiao Tung University (NYCU).
Taiwanese Hakka is a language spoken natively by about 1.5% of the population of Taiwan. Although the number of Hakka speakers continues to drop, especially among youth, it's not yet too late to save this language. Therefore, we are now calling for and welcoming participants from both academic and industrial sectors to FSR-2023. Students are especially welcome to participate in the competition for the Student Awards.
Key Messages
- Free Hakka Across Taiwan Vol.1 (HAT-Vol1) corpus collected in 2022.
- Proceedings of 2023: https://aclanthology.org/events/rocling-2023,all papers of FSR-2023 are in ACL Anthology with paper ID: 46-56 (所有客語語音辨認競賽論文,都被收錄在ACL Anthology中)
Schedule
Pilot Test Results(BETA VERSION)
目前結果為beta版,若有發現任何問題請盡快與我們聯絡!
Track1: 漢字計算結果使用 CER(字元錯誤率)
Baseline採用模型:
BSL-1: espnet + wavlm
BSL-2: Whisper medium
BSL-3: Whisper large-v2
Track2: 拼音計算結果使用 SER(音節錯誤率)
Baseline採用模型: espnet + wavlm
Baseline程式
ESPnet專案:https://github.com/yfliao/espnet
Whisper專案:https://github.com/yfliao/whisper-hakka
分數計算程式
FINAL-TEST RESULTS
分數計算程式(已於9/22更新)
- Contact
- 請統一用 sarc@nycu.edu.tw 進行聯絡,包括繳交辨認結果!
- Line 討論群組:FSR-2023 (請掃右方 QR code)
- NEWS
- Pilot-Test的答案:FSR-2023-Hakka-XYH8X-Eval-Key 資料已經可以從gitlab下載了!
- 客語語音辨認範例程式
- 分數計算程式
- FSR-2023-Hakka-Lavalier-Train 訓練資料已經可以下載了,會逐隊發信通知!
TRACKs
Build an automatic Hakka speech recognizer (ASR) that could output either (至少選一個Track):
Taiwanese Hakka Recommended Characters by Ministry of Education of Taiwan (客語漢字,依據教育部部定 臺灣客家語推薦用字,漢字優先)
Taiwan Hakka Pinyin (依據教育部部定 客家語拼音方案,以本調為準)
For example:
Track1 - 今晡日係拜二(除外來語外,都用漢字表示,另外,同義字也會先處理)
Track2 - gim24 bu24 ngid2 he55 bai55 ngi55(本調為準)
繳交格式說明:
檔名:請以“單位+隊名+參賽者”為檔名,以避免誤判(之前沒寫的沒關係,會檢查email位置)。
答案格式:ID 答案(同Kaldi, 一欄為音檔ID,一欄為語音辨認器輸出)
以下範例
Track1:
1 今晡日係拜二
2 老妹當好搞水
3 暗晡夜來吾屋下食夜
Track2:
1 gim24 bu24 ngid2 he55 bai55 ngi55
2 lo31 moi55 dong24 hau55 gau31 sui31
3 am55 bu24 ia55 loi11 nga24 vug2 ka24 siid5 ia55
Database
This challenge is based on the "HAT-Vol1" corpus.
"HAT-Vol1" consists of about 100 speakers recruited across Taiwan, in total about 80 hours (Training + Eval + Test sets).
This data is released here for FREE under a Non-Commercial Use Only license. Please read and accept the License.
Baseline Scripts: ESPnet-based baseline recipes are provided in Github for students to develop their own systems easily and quickly. --> TBA
Important Dates
2023/06/05 --- Registration Open & Training Data Release
2023/07/31 --- Registration Close
2023/08/07 --- Pilot-Test (dry-run only) Data Release
2023/08/14 --- Pilot-Test (dry-run only) Result Submission
2023/08/21 --- Pilot-Test (dry-run only) Performance Notification
2023/09/11 --- Final-Test Data Release
2023/09/22 --- Final-Test Result & Draft Paper Submission
2023/09/29 --- Final-Test Performance Notification (released)
2023/10/06--- Paper Submission
2023/10/21--- Award Ceremony and Workshop
PS: Pilot-Test (dry-run) is only used to make sure everything for the final test is fine, not for scoring!
Contact
Yuan-Fu Liao (廖元甫)
Full Professor, National Yang Ming Chiao Tung University