Word2Vec

Post date: May 26, 2021 2:57:38 PM

I have been accepted to - and am presently participating in - a 5 day online seminar titled Word Vectors for the Thoughtful Humanist by the Women Writers Project, a Digital Scholarship Group at Northeastern University. The aim of this course is to teach participants how to use Word2Vec for pedagogical aims. Instructors for the course are Sarah Connell, Julia Flanders, Syd Bauman and Laura Johnson .

In 2013, Google-employed Thomas Mikolov and colleagues introduced "Word2Vec", a tool for learning continuous word embeddings from raw data. Word2Vec adds a vector to words in a text representing the spatial distance between words which are seen as points in space. This distance describes the relation/ similarity between the words. Given a large unprepared corpus of data, Word2Vec can detect relationships between words and either predict which words go together with a given word. I have written about Word2Vec a long time ago here.

The course organizers took my 100 variant English translations of the Haggadah which were in .TEXT format, did some initial automated text cleaning (meanly changing capital letters to lower case and deletion of all punctuation) and loaded my corpus into their toolbox to train (window size 6, dimensions 100, iterations 10, negative samples 15) and it is now open for me to query. I have been playing with this for a few hours now and the results I get are pretty cool.

Word Queries

The toolbox includes a “Query term” box. The results that appear when typing in a word are the words that are closest to the term that you queried in vector space—that is, words that appear in similar contexts in the corpus. I decided to type in "rabbi". The results are variant spellings of the names of rabbis in the Haggadah e.g. eliezer, eliazar, eliazar, akiba, akiva, akeeba, akibah. Other rabbis (also with multiple spellings) that appear are Elazar, Azarya, Jose (the) Galilean, Tarphon, Yehoshua. As the Haggadah tells us, some of these Rabbis sat together in the town of Bnei Brak to celebrate Pesach together, some appear much later on in the text in a rabbinical discussion on how many plagues there were in Egypt and at the sea and the Haggadah text tells us that one Rabbi constructed a menemonic device to remember the order of the plagues.

Operations

We can use the operation tool in the toolbox to find out which of the rabbis mentioned in the text did NOT celebrate the Pesach in Bnei Brak. We can use addition or subtraction operations. Addition allows you to add the contexts associated with two terms to each other, while subtraction allows you to subtract the contexts associated with one word from another. So when we use the following operation: "Rabbi" - "Bnei" we should get a list of those rabbis who were not part of the Bnei Brak celebration. The results:

In the rabbinical discussion on the number of plagues, three Rabbis are mentioned: Rabbi Jose the Galilean, Rabbi Eliezer and Rabbi Akiva. Two of these also celebrated in Bnei Brak but Rabbi Jose did not. Hence he appears in this Rabbi-Bnei query. The rabbi who created a mnemonic device (basically an acrostic of the initials of the plagues) to remember the order of the plagues was Rabbi Yehudah. His suggested abbreviation is "dtzach - adash- beachab" But Rabbi Yehudah did not present this idea at the celebration in Bnei Brak so therefore his name and references to his mnemonic suggestion appear with the Rabbi-Bnei operation.

Clusters

Something else the toolbox does is automatically create clusters of words it feels belong together. For example, it created a cluster of: "fire, cat, water, dog, bit, stick, beat, ox, burned, drank" which are all elements from the final song in the Haggadah on the calamities that befell the goat that father bought. Interestingly enough the last three elements; the butcher, the angel of death and God are not clustered with these but that is probably because of the settings which allow only 10 words to be clustered together. When allowing for 30 words to appear the slaughterer/ butcher does pop up but the Angel of Death and God are still absent.

Another cluster tells in short what happened to the Jews in Egypt: "serve, egyptians, laid, hard, labor, embittered, hard, lives, ill, treated". The clusters can also tell us about certain customs e.g the last bite to be taken after the meal is from a piece of Matzah called the Afikoman. This can be seen in the cluster: "after, meal, afikoman, last, custom, meals, deseet, except, required, participant"

I bet I'll find loads more interested results so I will probably upload a follow up post later.