Longitudinal Review Data

We are releasing our longitudinal dataset of over 12.5 million reviews (2 million unique) from over 10,000 businesses on Yelp to accompany our paper "Reviews in motion: a large scale, longitudinal study of review recommendations on Yelp".

Background

Our data consists of three datasets

  1. The Eight Year Gap (EYG) dataset: We collected the same businesses as originally targeted by Mukherjee et al. [1], to create a long-term comparison for review data. This dataset is most useful when paired with the baseline dataset available here.

  2. The Chicago (CHI) dataset: To get a more expansive and longer term view on reviews, we collected reviews from every business in a selection of Chicago zipcodes. This especially allows a better view of authors, as we see multiple reviews from the same author.

  3. The US Density and Income Stratified (UDIS) dataset: We sought to get a more representative dataset. We collected two sub-datasets, one collected by stratifying zipcodes in the US by density and the other by income. This dataset is a more representative sample and allows for comparison between different geographic areas.

Request access

To request access, email longitudinal-review-data@lists.cs.princeton.edu with the following information:

Team:
Intended use of data:

Non-pseudonymized needed?:

Additionally, if requesting non-pseudonymized data:

Reason why pseudonymous data will not work for your use:
How you plan to protect the privacy of users' data:

For the difference between the pseudonymized and non-pseudonymized data, please see Structure.

Cite us

If you use our dataset, please cite us:

Ryan Amos, Roland Maio, Prateek Mittal. Reviews in motion: a large scale, longitudinal study of review recommendations on Yelp. The 6th Workshop on Technology and Consumer Protection (ConPro '22), 2022.


@article{
author = {Ryan Amos, Roland Maio, and Prateek Mittal},

title = {Reviews in motion: a large scale, longitudinal study of review recommendations on Yelp},

year = {2022},

journal = {The 6th Workshop on Technology and Consumer Protection (ConPro '22)},

url = {https://www.ieee-security.org/TC/SPW2022/ConPro/papers/amos-conpro22.pdf}
}

Contact

You can reach the team for questions at longitudinal-review-data@lists.cs.princeton.edu, or individually:

  • Ryan Amos <rbamos@cs.princeton.edu>

  • Roland Maio <rjm2212@columbia.edu>

  • Prateek Mittal <pmittal@princeton.edu>

[1] Mukherjee, Arjun, et al. "What yelp fake review filter might be doing?." Seventh international AAAI conference on weblogs and social media. 2013.