Ethical Consumption Before Capitalism

Introduction

Welcome to our website! We are Team 22: Ethical Consumption Before Capitalism. This website summarizes our journey through the 2021 Data+ summer program. We were given the prompt of researching what ethical consumption meant for England in the Early Modern period. Down below is our final poster that we presented at Data+'s finale: a research poster symposium. To explore deeper into what we did, please check out the other tabs in this website that are dedicated to our technical components and our GitHub, which is linked below. Last year's group's research into consumption-related words is also linked down below. We built off of their work, and often used their code as reference.

Poster

Final Poster

Summary of Our Talk

What does ethical consumption entail? Today, it often means making purchases that involve more environmentally-friendly materials and laborers working under fair conditions. But how was the concept of ethical consumption different during the Early Modern period in pre-capitalist Europe? The objective of this project is to analyze the relationship between consumption and ethics. We used philosophical and religious understandings of ethics to analyze specific consumption items. We used sentiment analysis to understand the sentiments surrounding each consumption item and cosine similarity to see how consumption items relate to each understanding of ethics. Our initial dataset of 60k texts came from Early English Books Online (EEBO), and after filtering for relevant texts that are within our date range (1580-1630), we had a dataset of 10.1k texts.


To clean our dataset, because we are using the Bag-of-Words model, we removed all punctuation. We also removed unwanted characters like ampersands. For normalizing, we use this program called VARD.


The next step that comes in our coding flow after cleaning and normalizing text is topic modeling. Topic modeling is a process to automatically identify topics presented in a corpus of text and to derive hidden patterns in that corpus. It is an unsupervised clustering method and we are choosing to do unsupervised learning because this refers to finding structure in unlabeled data, which our dataset is. In our topic model, we first create a document term matrix, which is a mathematical matrix that describes the frequency of terms occurring in a collection of documents. This is then fed in as an input to our Latent Dirichlet Association topic model which is a popular topic model which matches text in a document to a particular topic. You can also set the number of topics you want.

Topic modeling is important to this project because we used it to filter for texts that are relevant to us. By creating a new topic model for each individual cleaned text, we then compared the words in each topic model output to two lexicons we created to determine whether that text is relevant to our analysis or not, and we only uploaded relevant texts to a new folder in Box. We had two lexicons - one for religious terms and one for philosophical terms.


We started sentiment analysis by creating “context windows” which were the five words surrounding both the right and the left side of the consumption words we analyzed. After, we conducted dictionary- based sentiment analysis using VADER. In order for VADER work to its fullest we had to update the VADER lexicon with customized sentiments that best fit our time period.


Gold and Silver are both top heavy in distribution. Beer and tobacco have largely positive sentiments, possibly due to the fact that consuming these intoxicants were often seen as sophisticated-- something that the upper-echelon of society would do. Wool is one of the only items that is skewed more negative, which we attributed to metaphors and wool production. Metaphors like “wolf in sheep’s clothing.” Additionally, during this period, to boost wool production, many landowners evicted tenant farmers to raise sheep.


Along with sentiment analysis, we used Word2Vec to transform words in our dataset into vectors in a mathematical vector space. This process is known as word embedding. Using linear algebra, we were able to calculate the cosine of the angle between various word vectors. This is referred to as cosine similarity, and it allows us to determine how related two words are. Using cosine similarity, we generated a heatmap to choose distinct religious and philosophical words. With these words, we compared the religious and philosophical similarity scores for certain consumption items over time. Based on these line graphs, we found that all of the items are typically more similar to philosophy than religious. Also, most of the trends in religious and philosophical similarity over time match each other. Finally, we found that both wool and beer have a negative cosine similarity to religious terms. The negative value is hard to interpret and requires more research on our part.


We believe that the next steps for the project should include exploring negative cosine similarity relationships, developing word frequency graphs to discover other consumption items that may be worth analyzing, researching ways to distinguish metaphorical language from literal language, and restructuring the text cleaning lexicons.



We would like to give a special thank you to our project leads, our project manager, team 17, and the Data+ coordinating team. To check out our references and details of the project, please visit our website and GitHub!