Digital Ricoeur - LDA Topic Modeling

Introduction

Over his long intellectual career, Paul Ricoeur explored topics from phenomenology and hermeneutics to memory, history, narrative, justice, and more. To better understand how Ricoeur’s topics of interest changed over time, we performed Latent Dirichlet Allocation (LDA) topic modeling on Ricoeur’s texts and created the accompanying visualizations. (For a nice non-technical introduction on LDA, read this short vignette by Matthew Jockers.) The outcome of this process are clusters of words that make up each topic. Each topic represents words that Ricoeur used together frequently. The clusters are created solely through the texts, and no information about the content of the texts is provided. The intuition is that authors use similar words to express ideas about a specific topic. For example, when Ricoeur wrote about metaphors, words such as 'metaphor,' 'discourse,' 'meaning,' 'language,' and 'sense' co-occur with great frequency in some documents. Another relevant aspect is that the model does not name the topics around which clusters occur. This is the interpretative task of the readers of the model, who analyze the set of words that have been grouped into an affinity cluster and recognize a theme in Ricoeur's work. In some cases, the probabilistic grouping fails, and one cannot identify any meaningful theme from the set of words in a given topic.

One of the main goals of this tool is to inspire researchers to engage with a quantitative model of Ricoeur's immense work to gain possible insights into how Ricoeur's patterns of inquiry changed over time. Therefore, each decade of Ricoeur's production was topic modeled separately. The initial choice for splitting the corpus into calendar decades was arbitrary, and we will consider other time-slicing strategies based on interpretive criteria.

How the corpus is organized

The corpus used for this model began with Ricoeur’s English texts available through the Digital Ricoeur project. However, we made several changes to this initial corpus. The fundamental principle we tried to follow is grouping the texts on the basis of their original date of publication, so that the groupings reflect the subjects with which Ricoeur was engaging at a particular stage of his intellectual career. This led us to make several specific decisions regarding what to include and which dates to use. (See below and Appendix B for further details.) For all texts, the publication date reflects the date of the original publication regardless of language. If a text has a later publication date (than, for example, when Ricoeur gave a lecture or course), it means he went back and edited the text. The later text (with the later publication date) reflects Ricoeur’s final thoughts about the topic, and that was the date we used for organizing the text within the corpus. However, if there is a later publication date because a text was included in a volume edited by someone other than Ricoeur, we used the date of the original lecture, course, etc. For example, Being, Essence and Substance in Plato and Aristotle was published posthumously in French in 2011 (in English in 2013), but it is the text of a course Ricoeur taught at the University of Strasbourg in 1953-1954, so the date used was 1954. This ensures that the date reflects when Ricoeur was thinking about these topics. Similarly, if the publication date is over multiple years (for example, 1954-1955), we used the later year as the publication date as it reflects the latest date Ricoeur devoted attention to these topics.

To make the length of texts more uniform and the thematic of each document more cohesive, we split up books by chapter. If there is an introduction to a book section that includes multiple chapters, this introduction was included with the first chapter of that section. For texts with multiple translations or duplicates, texts in collections were included in the corpus. If none of the texts were included in a collection or if multiple translations were used in different collections, we used the most recent text, but kept the document in the original publication date cluster, following the underlying corpus organization heuristic. Further, we excluded the following types of texts from the corpus as they may not reflect Ricoeur’s core scholarship focus at the time: interviews, debates, discussions, dialogues, and roundtables, acceptance and commencement speeches, forwards, introductions, and prefaces to texts not written by Ricoeur, commemorative texts, autobiographical work, sermons, reviews, critical discussions, replies, and responses and comments, as well as miscellaneous texts. However, if any of these types of texts were included in a collection of primary texts, we included them in the corpus. All texts included are documented in Appendix A, organized by decade. The heuristic for document selection is included in Appendix B.

To clean the texts, we removed endnotes, title, author, and publisher information, and any other information not part of the body of the text that was easily removed. This left only the body of the text (except for footnotes not easily removed).

Topic Models by Decades

Acknowledgements

Anya Workman undertook the original corpus preparation and website descriptions in the summer of 2023, funded by Bowdoin College's Gibbons Summer Research Program.

Page updated

Google Sites

Report abuse