Computational Methodology and Results

Data

The data we're using for analysis is a .csv file of data scraped off of the subreddit r/confession consisting of fifty thousand rows with the following columns:

id
score (upvotes)
title
selftext (the text of the post)
author
subreddit
created_utc

Download full CSV

confession_submissions-top10

Our methodology can be neatly partitioned into two sections: quantitive analysis and qualitative analysis.

Quantitive

Preprocessing
Temporal Partitioning
TF-IDF Analysis
Word Embedding

For information on the libraries and modules used, see the Tools and Libraries section.

Qualitative

Using our results, primarily from the word embedding analysis, and models built, in particular TF-IDF similarity, from the quantitive section, we will sample a collection of eight posts for close reading.

Click here for more information on our sampling methodology.

Page updated

Google Sites

Report abuse