Laughterscape: Large-scale In-the-wild Japanese laughter corpus / 日本語母語話者による大規模笑い声コーパス

Download / ダウンロード

Click here. [zip 0.7 GB]

Description / 内容

This is a large-scale in-the-wild Japanese laughter corpus collected from YouTube videos. This corpus is particularly suitable for laughter synthesis since every utterance only has laughter of a single speaker.

Please refer our paper accepted by Interspeech 2023 to find more details on the data collection process. Note that, we continued data collection after the paper submission, so the data size of this corpus is larger than that described in the paper.

Duration: 6.04 hours
Sampling rate: 24 kHz
Speakers: 584 Japanese speakers
Utterance: 11413 utterances

本コーパスは，YouTube動画から収集した大規模の笑い声コーパスです．このコーパスは全て単独笑い（単独話者による笑いを指す．複数人による同時の笑いは含まない）から成るため，笑い声合成に適しています．

データ収集とコーパス構築の詳細は，国際会議 Interspeech 2023 に採択された論文を参照して下さい．なお，論文投稿後もデータ収集を継続したため，本コーパスのデータサイズは論文に記載されているものよりも大きくなっています．

時間長: 6.04 hours
サンプリング周波数: 24 kHz
話者: 584 名の日本語話者
発話： 11413 発話

License / ライセンス

Research and development purpose only. (tentative. This will be subject to change.)

研究開発目的のみ．（暫定．変更する場合があります．）

Contributors / 作成者

Paper / 論文

Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari, "Laughter synthesis using pseudo phonetic tokens with a large-scale in-the-wild laughter corpus," INTERSPEECH, Dublin, Ireland, Aug. 2023.

Acknowledgement / 謝辞

本コーパスの構築は，以下のプロジェクトを受けて実施したものです．

JST SPRING GX JPMJSP2108
科研費基盤B 23KJ0828
Google research fund

Link / リンク

Corpus list

Inquiry / お問い合わせ

Please contact to Shinnosuke Takamichi.

コーパスに関するお問い合わせは高道慎之介までご連絡下さい．

Google Sites

Report abuse