Resources‎ > ‎

Data Set for Mobile App Retrieval (SIGIR 2015)

This is a data set used in the following paper:
Park, Dae Hoon, Mengwen Liu, ChengXiang Zhai, and Haohong Wang. "Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval." In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 533-542. ACM, 2015.

Link for mobile app and relevance data [ zip ]
Link for user review data for mobile apps [ zip ]

The app data include 43,041 mobile apps. The app data are stored in JSON format, and they include each app's various information such as:
app_name, app_id, app_url, category, app_description, number_of_reviews, avg_user_rating, required_os, price, content_rating, date_published, developer, developer_email_address, developer_link, and number_of_downloads.

The app data also include relevance information for (query, app) pairs, where relevance values are in a range [0,2].

The user review data include reviews of 43,041 mobile apps. Each app has up to 50 user reviews. In total, there are 1,385,607 user reviews. Each text line represents each app in JSON format. Each app contains the following information: app_id, num_reviews, and a list of reviews.

Please cite the following paper if you use the data:
@inproceedings{park2015leveraging,
  title={Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval},
  author={Park, Dae Hoon and Liu, Mengwen and Zhai, ChengXiang and Wang, Haohong},
  booktitle={Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={533--542},
  year={2015},
  organization={ACM}
}