The pseudonymized data follows the same format, many of the above fields have been removed or replaced with a pseudorandom number.
{
"<business_id>": [ //pseudonym for business_id
{ //One review
"content": "", //pseudonym for text of review
"rating": "", //star rating
"date": "", //post date
"author_id": "", //pseudonym for data_hovercard_id or user_page_url
"elite": true/false, //Whether the user has "elite" status
"keywords": {
"mask": true/false, //Does any lemma in the review matches the "mask"
"vaccine": true/false, //Does any lemma in the review matches the lemma of "vaccine"
}
}, ...
], ...
}
The review's recommendation status can be inferred from the folder in which it is contained
{
"<business_id>": { //pseudonym for business_id
"amenities": [ //amenities list
{ // One amenity
"displayText": "", //Human readable name
"alias": "", //Amenity name
"isActive": true/false, //Some amenities are listed as not present
"iconName": "", //The image displayed
}, ...
]
}, ...
}
Amenities except "proof_of_vaccination_required","staff_fully_vaccinated","customers_must_wear_masks","employees_wear_masks" are removed.
[
{ //One business
"id": "", //pseudonym for business_id
"price": "$"/"$$"/"$$$"/$$$$", //Fewer "$s" is cheaper
"rating": "" //Rating at the time of collection
}, ...
]
data/
crawl_<crawl name>/
(recommended|not_recommended|removed)_reviews/
<zipcode>.json //Reviews file
business_data/
<zipcode>.json //Amenities file
business_data/
<experiment name>/
<zipcode>.json //Fusion API output
eyg_data/
(recommended|not_recommended)_reviews.json //Reviews file
businessid_to_data.json // Fusion API output
{
"<business_id>": [ //review list
{ //One review
"content": "", //text of review
"rating": "", //star rating
"date": "", //post date
"user_image_url": "", //User's profile image
"user_page_url": "", //User's profile page. Only present for recommended reviews
"data_hovercard_id": "", //Unique identifier for the user. Only present for not recommended/removed reviews
"user_name": "", //User first name, last initial
"user_friends": "", //Number of friends
"user_review_count": "", //Number of reviews
"user_photos": "", //Number of photos uploaded"
"elite": true/false //Whether the user has "elite" status
}, ...
], ...
}
The review's recommendation status can be inferred from the folder in which it is contained
{
"<business_id>": {
"ammenities": [ //amenities list. Note the misspelling -- two ms
{ //One amenity
"displayText": "", //Human readable name
"alias": "", //Amenity name
"isActive": true/false, //Some amenities are listed as not present
"iconName": "", //The image displayed
}, ...
]
}, ...
}
See the Yelp Fusion API. The url field from the API contained a tracker, which we removed.