This is a data set used in the following paper:
Park, Dae Hoon, Hyun Duk Kim, ChengXiang Zhai, and Lifan Guo. "Retrieval of relevant opinion sentences for new products." In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 393-402. ACM, 2015.
Link for CNET data [ zip ]
The product categories we used in the paper are: digital cameras and MP3 players.
For each product category, the data set consists of four files:
Details on the data set can be found in the paper. Please note that some user reviews for MP3 players somehow contain duplicate sentences within the same review. You can consider to remove them for experiments.
Please cite the following paper if you use the data set:
@inproceedings{park2015retrieval,
title={Retrieval of relevant opinion sentences for new products},
author={Park, Dae Hoon and Kim, Hyun Duk and Zhai, ChengXiang and Guo, Lifan},
booktitle={Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages={393--402},
year={2015},
organization={ACM}
}