The data we're using for analysis is a .csv file of data scraped off of the subreddit r/confession consisting of fifty thousand rows with the following columns:
id
score (upvotes)
title
selftext (the text of the post)
author
subreddit
created_utc
For information on the libraries and modules used, see the Tools and Libraries section.
Using our results, primarily from the word embedding analysis, and models built, in particular TF-IDF similarity, from the quantitive section, we will sample a collection of eight posts for close reading.
Click here for more information on our sampling methodology.