The data set comprises of telephone quality speech data in Hindi. The recordings were collected through the Mobile Vaani platform having users from all across India and, hence, it includes regional/dialectal variations of Hindi. The recordings are accompanied by their corresponding transcriptions done by crowd workers who were recruited via the Uliza platform, along with a variety of metadata for the recordings including location, dialect, emotion, audio quality.
We are releasing approximately 1000 hours of unlabelled data and 105 hours of labelled speech data through this challenge. The details of the data sets released for this challenge are as follows:
1) Train set - 100 hours (labelled)
https://asr.iitm.ac.in/Gramvaani/NEW/GV_Train_100h.tar.gz
2) Development set - 5 hours (labelled)
https://asr.iitm.ac.in/Gramvaani/NEW/GV_Dev_5h.tar.gz
3) 1000 hours of unlabelled data
https://asr.iitm.ac.in/Gramvaani/Zip/GV_1000_Part1/Gramvaani_1000hrData_Part1.tar.gz
https://asr.iitm.ac.in/Gramvaani/Zip/GV_1000_Part2/Gramvaani_1000hrData_Part2.tar.gz
https://asr.iitm.ac.in/Gramvaani/Zip/GV_1000_Part3/Gramvaani_1000hrData_Part3.tar.gz
https://asr.iitm.ac.in/Gramvaani/Zip/GV_1000_Part4/Gramvaani_1000hrData_Part4.tar.gz
https://asr.iitm.ac.in/Gramvaani/Zip/GV_1000_Part5/Gramvaani_1000hrData_Part5.tar.gz.zip
4) Evaluation data - 3 hours (labelled)
https://asr.iitm.ac.in/Gramvaani/NEW/GV_Eval_3h.tar.gz
5) Metadata - (for all the Gramvaani data released as part of the 1111 Hours Hindi ASR Challenge)
https://asr.iitm.ac.in/Gramvaani/NEW/Metadata.tar.gz
Data can also be downloaded from OpenSLR, link given below. https://www.openslr.org/118/
Gram Vaani data has .mp3 files with mix of sampling rates ranging from 8KHz to 48 KHz for both labelled 100 hours of data & unlabelled 1000 hours of data. Stats for the same is given below
How to Participate?
Enrol yourself by registering on this link: Register Here!!!
Registering on the above link provides access to download the training, development data and unlabelled data for Gram Vaani challenge