We further re-scraped more than 15M records with detailed review text. However, as the re-scraping took us quite a few weeks, and some records thus changed or became unaccessible at that time. Therefore some conflicts do exist if you match the file goodread_reviews_dedup.json.gz to the previous interaction file goodreads_interactions_dedup.json.gz. We recommend using the review file when you absolutely need the complete review texts; otherwise, the interaction file should be a self-consistent one.
If you are using our datasets, please cite the following papers:
{'user_id': '8842281e1d1347389f2ab93d60773d4d',
'book_id': '4986701',
'review_id': 'bb7de32f9fadc36627e61aaef7a93142',
'rating': 4,
'review_text': 'Found the Goodreads down image in this, and many other useful images too!',
'date_added': 'Thu Aug 04 10:02:02 -0700 2011',
'date_updated': 'Thu Aug 04 10:02:02 -0700 2011',
'read_at': '',
'started_at': '',
'n_votes': 6,
'n_comments': 4}
'(view spoiler)['
and '(hide spoiler)]'
{'user_id': '01ec1a320ffded6b2dd47833f2c8e4fb',
'timestamp': '2013-12-28',
# a list of sentences, where the first element indicates if the sentence contains spoilers (1) or not (0) 'review_sentences': [[0, 'First, be aware that this book is not for the faint of heart.'],
[0, 'Human trafficking, drugs, kidnapping, abuse in all forms - this story contains all of this and more.'],
...,
[0, '(ARC provided by the author in return for an honest review.)']],
'rating': 5,
'has_spoiler': False,
'book_id': '18398089',
'review_id': '4b3ffeaf14310ac6854f140188e191cd'}
{'user_id': '8842281e1d1347389f2ab93d60773d4d',
'book_id': '13453029',
'review_id': '46a6e1a14e8afc82d221fec0a2bd3dd0',
'rating': 4,
# raw review text, where spoiler contents are surrounded by '(view spoiler)[' and '(hide spoiler)]' 'review_text': "A fun fast paced book that sucks you in right away and doesn't let go.
... (view spoiler)[His role is to eliminate any doubt ... immediately. (hide spoiler)]
... ",
'date_added': 'Tue Dec 04 11:12:22 -0800 2012',
'date_updated': 'Sat Jul 26 11:43:28 -0700 2014',
'read_at': 'Tue Jul 08 00:00:00 -0700 2014',
'started_at': 'Wed Jul 02 00:00:00 -0700 2014',
'n_votes': 5,
'n_comments': 1}