We further re-scraped more than 15M records with detailed review text. However, as the re-scraping took us quite a few weeks, and some records thus changed or became unaccessible at that time. Therefore some conflicts do exist if you match the file goodread_reviews_dedup.json.gz to the previous interaction file goodreads_interactions_dedup.json.gz. We recommend using the review file when you absolutely need the complete review texts; otherwise, the interaction file should be a self-consistent one.
If you are using our datasets, please cite the following papers:
{'user_id': '8842281e1d1347389f2ab93d60773d4d', 'book_id': '4986701', 'review_id': 'bb7de32f9fadc36627e61aaef7a93142', 'rating': 4, 'review_text': 'Found the Goodreads down image in this, and many other useful images too!', 'date_added': 'Thu Aug 04 10:02:02 -0700 2011', 'date_updated': 'Thu Aug 04 10:02:02 -0700 2011', 'read_at': '', 'started_at': '', 'n_votes': 6, 'n_comments': 4} '(view spoiler)[' and '(hide spoiler)]'{'user_id': '01ec1a320ffded6b2dd47833f2c8e4fb', 'timestamp': '2013-12-28', # a list of sentences, where the first element indicates if the sentence contains spoilers (1) or not (0) 'review_sentences': [[0, 'First, be aware that this book is not for the faint of heart.'], [0, 'Human trafficking, drugs, kidnapping, abuse in all forms - this story contains all of this and more.'], ..., [0, '(ARC provided by the author in return for an honest review.)']], 'rating': 5, 'has_spoiler': False, 'book_id': '18398089', 'review_id': '4b3ffeaf14310ac6854f140188e191cd'} {'user_id': '8842281e1d1347389f2ab93d60773d4d', 'book_id': '13453029', 'review_id': '46a6e1a14e8afc82d221fec0a2bd3dd0', 'rating': 4, # raw review text, where spoiler contents are surrounded by '(view spoiler)[' and '(hide spoiler)]' 'review_text': "A fun fast paced book that sucks you in right away and doesn't let go. ... (view spoiler)[His role is to eliminate any doubt ... immediately. (hide spoiler)] ... ", 'date_added': 'Tue Dec 04 11:12:22 -0800 2012', 'date_updated': 'Sat Jul 26 11:43:28 -0700 2014', 'read_at': 'Tue Jul 08 00:00:00 -0700 2014', 'started_at': 'Wed Jul 02 00:00:00 -0700 2014', 'n_votes': 5, 'n_comments': 1}