Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension

Abstract

Native Chinese Reader (NCR) is a new machine reading comprehension (MRC) dataset with particularly long articles in both modern and classical Chinese,which is collected from the exam questions for the Chinese course in China’s high schools, and are designed to evaluate the language proficiency of native Chinese youth.

2021 Hai Hua AI Competition

This dataset is a part of the 2021 Hai Hua AI competition at https://www.biendata.xyz/competition/haihua_2021/

Best Competition Model

We release the competition model with the highest accuracy at https://github.com/xssstory/NCR_competition_model

Baselines

We release the code of baselines at https://github/com/xssstory/NCR_baseline

Bibtex

@inproceedings{

xu2021native,

title={Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension},

author={Shusheng Xu and Yichen Liu and Xiaoyu Yi and Siyuan Zhou and Huizi Li and Yi Wu},

booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},

year={2021},

url={https://openreview.net/forum?id=GEcWUTN1v1v}

}