Formosa Speech Recognition Challenge 2023 - Hakka ASR

Call-for-Participants

Formosa Speech Recognition Challenge 2023 (FSR-2023) is the third event of the Formosa Speech in the Wild (FSW) project, which is organized by National Yang Ming Chiao Tung University (NYCU).

Taiwanese Hakka is a language spoken natively by about 1.5% of the population of Taiwan. Although the number of Hakka speakers continues to drop, especially among youth, it's not yet too late to save this language. Therefore, we are now calling for and welcoming participants from both academic and industrial sectors to FSR-2023. Students are especially welcome to participate in the competition for the Student Awards.

Call for Participants - Formosa Speech Recognition Challenge 2023.pdf

Key Messages

Free Hakka Across Taiwan Vol.1 (HAT-Vol1) corpus collected in 2022.
Proceedings of 2023: https://aclanthology.org/events/rocling-2023，all papers of FSR-2023 are in ACL Anthology with paper ID: 46-56 (所有客語語音辨認競賽論文，都被收錄在ACL Anthology中)

Schedule

Pilot Test Results（BETA VERSION）

目前結果為beta版，若有發現任何問題請盡快與我們聯絡！
Track1: 漢字計算結果使用 CER（字元錯誤率）
- - Baseline採用模型:
    - BSL-1: espnet + wavlm
    - BSL-2: Whisper medium
    - BSL-3: Whisper large-v2
Track2: 拼音計算結果使用 SER（音節錯誤率）
- - Baseline採用模型: espnet + wavlm
Baseline程式
- - ESPnet專案：https://github.com/yfliao/espnet
  - Whisper專案：https://github.com/yfliao/whisper-hakka
分數計算程式
- - https://github.com/yfliao/FSR-2023-Hakka-ASR-Scoring

FINAL-TEST RESULTS

分數計算程式(已於9/22更新)
- https://github.com/yfliao/FSR-2023-Hakka-ASR-Scoring

Contact
請統一用 sarc@nycu.edu.tw 進行聯絡，包括繳交辨認結果！

Line 討論群組:FSR-2023 （請掃右方 QR code）

Facebook 社團: Formosa Speech in the Wild （請掃右方 QR code）

NEWS

Pilot-Test的答案：FSR-2023-Hakka-XYH8X-Eval-Key 資料已經可以從gitlab下載了!
客委會的臺灣客語語料庫的語料（書面約72萬字/口語逾40萬字，網址：https://corpus.hakka.gov.tw/#/）授權釋出供參賽團隊使用。
客語語音辨認範例程式
- - ESPnet專案：https://github.com/yfliao/espnet
  - Whisper專案：https://github.com/yfliao/whisper-hakka
分數計算程式
- - https://github.com/yfliao/FSR-2023-Hakka-ASR-Scoring
FSR-2023-Hakka-Lavalier-Train 訓練資料已經可以下載了，會逐隊發信通知！
報名方式：請簽智慧財產保護暨保密同意書（檔案連結），並回傳簽好的掃描檔到sarc@nycu.edu.tw，即可獲取下載訓練語料的帳密。
客語AI應用黑客松競賽徵案擴大服務量能【客家新聞20230522】[YouTube]

TRACKs

Build an automatic Hakka speech recognizer (ASR) that could output either (至少選一個Track):

Taiwanese Hakka Recommended Characters by Ministry of Education of Taiwan (客語漢字，依據教育部部定臺灣客家語推薦用字，漢字優先)
Taiwan Hakka Pinyin (依據教育部部定客家語拼音方案，以本調為準)

For example：

Track1 - 今晡日係拜二（除外來語外，都用漢字表示，另外，同義字也會先處理)
Track2 - gim24 bu24 ngid2 he55 bai55 ngi55（本調為準）

繳交格式說明：

檔名：請以“單位＋隊名＋參賽者”為檔名，以避免誤判（之前沒寫的沒關係，會檢查email位置）。

答案格式：ID 答案（同Kaldi, 一欄為音檔ID，一欄為語音辨認器輸出）

以下範例

Track1：

1 今晡日係拜二

2 老妹當好搞水

3 暗晡夜來吾屋下食夜

Track2：

1 gim24 bu24 ngid2 he55 bai55 ngi55

2 lo31 moi55 dong24 hau55 gau31 sui31

3 am55 bu24 ia55 loi11 nga24 vug2 ka24 siid5 ia55

Database

This challenge is based on the "HAT-Vol1" corpus.
"HAT-Vol1" consists of about 100 speakers recruited across Taiwan, in total about 80 hours (Training + Eval + Test sets).
This data is released here for FREE under a Non-Commercial Use Only license. Please read and accept the License.
Baseline Scripts: ESPnet-based baseline recipes are provided in Github for students to develop their own systems easily and quickly. --> TBA

Rules

Registration

Important Dates

2023/06/05 --- Registration Open & Training Data Release
2023/07/31 --- Registration Close
2023/08/07 --- Pilot-Test (dry-run only) Data Release
2023/08/14 --- Pilot-Test (dry-run only) Result Submission
2023/08/21 --- Pilot-Test (dry-run only) Performance Notification
2023/09/11 --- Final-Test Data Release
2023/09/22 --- Final-Test Result & Draft Paper Submission
2023/09/29 --- Final-Test Performance Notification (released)
2023/10/06--- Paper Submission
2023/10/21--- Award Ceremony and Workshop

PS: Pilot-Test (dry-run) is only used to make sure everything for the final test is fine, not for scoring!

Contact

Yuan-Fu Liao (廖元甫)
Full Professor, National Yang Ming Chiao Tung University
yfliao@nycu.edu.tw, https://speech.web.nycu.edu.tw

Formosa Speech Recognition Challenge 2023 - Hakka ASR

Call-for-Participants

Key Messages

Free Hakka Across Taiwan Vol.1 (HAT-Vol1) corpus collected in 2022.

Proceedings of 2023: https://aclanthology.org/events/rocling-2023，all papers of FSR-2023 are in ACL Anthology with paper ID: 46-56 (所有客語語音辨認競賽論文，都被收錄在ACL Anthology中)

Schedule

Pilot Test Results（BETA VERSION）

FINAL-TEST RESULTS

Contact

請統一用 sarc@nycu.edu.tw 進行聯絡，包括繳交辨認結果！

Line 討論群組:FSR-2023 （請掃右方 QR code）

Facebook 社團: Formosa Speech in the Wild （請掃右方 QR code）

NEWS

Pilot-Test的答案：FSR-2023-Hakka-XYH8X-Eval-Key 資料已經可以從gitlab下載了!

客委會的臺灣客語語料庫的語料（書面約72萬字/口語逾40萬字，網址：https://corpus.hakka.gov.tw/#/）授權釋出供參賽團隊使用。

客語語音辨認範例程式

ESPnet專案：https://github.com/yfliao/espnet

Whisper專案：https://github.com/yfliao/whisper-hakka

分數計算程式

https://github.com/yfliao/FSR-2023-Hakka-ASR-Scoring

FSR-2023-Hakka-Lavalier-Train 訓練資料已經可以下載了，會逐隊發信通知！

報名方式：請簽智慧財產保護暨保密同意書（檔案連結），並回傳簽好的掃描檔到sarc@nycu.edu.tw，即可獲取下載訓練語料的帳密。

客語AI應用黑客松競賽徵案擴大服務量能【客家新聞20230522】[YouTube]

TRACKs

繳交格式說明：

Database

Rules

Registration

Important Dates

Contact

Organizers

Sponsors

Formosa Speech Recognition Challenge 2023 - Hakka ASR

Call-for-Participants

Key Messages

Free Hakka Across Taiwan Vol.1 (HAT-Vol1) corpus collected in 2022.

Proceedings of 2023: https://aclanthology.org/events/rocling-2023，all papers of FSR-2023 are in ACL Anthology with paper ID: 46-56 (所有客語語音辨認競賽論文，都被收錄在ACL Anthology中)

Schedule

Pilot Test Results（BETA VERSION）

FINAL-TEST RESULTS

Contact

請統一用 sarc@nycu.edu.tw 進行聯絡，包括繳交辨認結果！

Line 討論群組:FSR-2023 （請掃右方 QR code）

Facebook 社團: Formosa Speech in the Wild （請掃右方 QR code）

NEWS

Pilot-Test的答案：FSR-2023-Hakka-XYH8X-Eval-Key 資料已經可以從gitlab下載了!

客委會的臺灣客語語料庫的語料（書面約72萬字/口語逾40萬字，網址：https://corpus.hakka.gov.tw/#/）授權釋出供參賽團隊使用。

客語語音辨認範例程式

ESPnet專案：https://github.com/yfliao/espnet

Whisper專案：https://github.com/yfliao/whisper-hakka

分數計算程式

https://github.com/yfliao/FSR-2023-Hakka-ASR-Scoring

FSR-2023-Hakka-Lavalier-Train 訓練資料已經可以下載了，會逐隊發信通知！

報名方式：請簽智慧財產保護暨保密同意書（檔案連結），並回傳簽好的掃描檔到sarc@nycu.edu.tw，即可獲取下載訓練語料的帳密。

客語AI應用黑客松競賽 徵案擴大服務量能【客家新聞20230522】[YouTube]

TRACKs

繳交格式說明：

Database

Rules

Registration

Important Dates

Contact

Organizers

Sponsors

客語AI應用黑客松競賽徵案擴大服務量能【客家新聞20230522】[YouTube]