This is a data set used in the following paper:
Park, Dae Hoon, and Rikio Chiba. "A neural language model for query auto-completion." Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017. [ pdf ]
Link for data set [ zip ]
The data set includes files for four data:
In each of training, validation, and test data, there are only two columns: a prefix and the corresponding query.
For example, a query "my space\n", where \n is a new line character, generates the following prefix-query pairs
my my space\n
my s my space\n
my sp my space\n
my spa my space\n
my spac my space\n
my space my space\n
Please cite the following paper if you use the data set:
@inproceedings{park2017neural, title={A neural language model for query auto-completion}, author={Park, Dae Hoon and Chiba, Rikio}, booktitle={Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval}, pages={1189--1192}, year={2017}, organization={ACM} }