Shinnosuke Takamichi (高道慎之介)

CPJD: Crowdsourced Parallel Speech Corpus of Japanese Dialects /クラウドソーシングを利用した日本語多方言音声コーパス

Download / ダウンロード

Description / 内容

This corpus is a parallel Japanese dialect database collected using crowdsourcing.

Specification:

Speaker: 21 native dialect speakers
Sentence: 250 per speaker
Dialect: 20 dialects

Directory:

README.txt # This file
F001 - M009 # transcripts
speaker-info.txt
- speaker # speaker's name
- gender # speaker's gender
- place # speaker's birth place
- dialect # speaker's dialect
- parallel-common # filename of parallel sentences in common language
common_set{1,2}.txt # parallel sentences in common language

このコーパスは，クラウドソーシングを用いて収集した，パラレルな日本語方言データベースです．

スペック:

話者: 21 名のネイティブ方言話者
文: 各話者につき 250 文
方言: 20 方言

ディレクトリ

README.txt # このファイル
F001 - M009 # 書き起こし文のあるフォルダ
speaker-info.txt
speaker # 話者の名前
gender # 話者の性別
place # 話者の出生地
dialect # 話者の方言名
parallel-common # 日本標準語における対訳文のファイル名
common_set{1,2}.txt # 日本標準語における対訳文

License / ライセンス

CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0/

Contributors / 作成者

高道慎之介

Shinnosuke Takamichi

Hiroshi Saruwatari

Paper / 論文

Shinnosuke Takamichi, Hiroshi Saruwatari, "CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects," Proc. LREC, 2018

Acknowledgement / 謝辞

本コーパスの構築は，以下のプロジェクトを受けて実施したものです．

-

Link / リンク

Corpus list

Google Sites

Report abuse