Visualizing

Networks

Again, following Melanie Walsh's excellent textbook, I explored how to visual the social networks from two books in the Hebrew Bible, 1 Samuel and 2 Samuel. The first describes how Israel first 'elected' a king, Saul, and his struggles to gain and maintain control of the kingdom. One of his most significant challenges was from the Judahite David, who eventually ursurped the throne. (The Bible's description of this, written by pro-David court historians, sanitizes the story considerably!)

To create a network, one needs to create a list of "nodes" (the points) and "edges" (the lines connecting them). Walsh illustrates this using a set of data from Game of Thrones, with different character names (as "source" and "target") the number of interactions between them (the "weight" of their relationship).

Getting the Names

To do this with Samuel, I needed to generate a list of names in the books and then see how often they appeared within 15 words of each other in the text. (There's nothing magical about the 15 -- it just happens to be the value used by the people who put together the GOT dataset, so I copied it.)

NER

Getting the names was surprisingly difficult. My first thought was to use another tool that Walsh discussed. SpaCy (a natural language tool set) includes a "Named Entity Recognition" tool that is supposed to be able to parse sentences and identify how different words function. I hoped to use it to find names in the text. Unfortunately, it could find names that continue to be used ("David" or "Tamar") but there are a LOT of biblical names that are not in common use. (See my description of using NER on different biblical texts.)

POS (parts of speech) tagging

Next, I tried spaCy's "Part of Speech" tagging tool. This worked better: spaCy was able to identify many words, correctly, as "proper nouns." Unfortunately, this included a lot of geographic location names (and some errors, too. "So" is used a lost in the ASV, but its not a proper noun!).

I looked at POS taggers from NLTK and Stanford, too. The former generated a much higher rate of false positives and the latter was MUCH slower.

One way to shorten the list of proper nouns was to compare it to a list of names from the Bible. I found lists of names for women and men and women at Biblegateway.com. I saved them to my drive and used them to generate a list of 1,900+ names in the Bible. I used this to filter the proper noun list, which got rid of most of the geographic terms. But there are lots of places in the Bible named after people, like "Israel" or "Judah," so some cleaning still needed to be done.

Filtering with Bible name list

Finally, I decided to skip the whole POS process. I got better results by just taking the text of the biblical book and filtering it with the list of names.

  • Technical: I did this by turning the text into a list of words and then turned both the list of words and the list of names into sets. The intersection of the two sets of words were the terms found on both lists of words -- in this case, the names from the biblical name list that appeared in the bible text. This is a VERY easy and fast process in Python.

Creating the "interaction" table

Once I had a list of all the names in a biblical text, I created a list of all the possible combinations of those terms. Again, Python has a library that makes this easy. Given a list of, say, 72 names, I can create a list of all the possible pairs like so:

name_pairs = list(combinations(name_list,2)).

For a list of 72 items, this creates 2,556 combinations. This is a relatively small number of possibilities for a computer to sort through.

I found the best way to look for words in close proximity was with a regular expression, whose form Mr. Google helped me to find. The following expression will return all the cases where A is within 15 words of B or B is within 15 words of A in TEXT.

re.findall(rf'\b{a}\W+(?:\w+\W+){{1,15}}?{b}\b|\b{b}\W+(?:\w+\W+){{1,15}}?{a}\b',text))

This allows me to create a list that has "SOURCE" (A) and "TARGET" (B) and the number of "interactions" between them.

Visualization (finally)

Once I had this, I just plugged the values into the code that Walsh created for her textbook. Doing so allowed me to create network maps for 1 and 2 Samuel. To see them, you may need to drag them into the windows below:

Comments / Analysis

These maps illustrate the themes readers have found in the text forever. 1 Samuel shows there are two major players, Saul and David, who interact with many of the same players. But some, such as Abigail and Nabal, interact only with David. The software is smart enough to group the characters into different factions and the node colors reflect this.

In 2 Samuel, there is one major player, David. But there are a number of significant other characters, including Absalom, Joab, and Saul's clan. (I left "Saul" in the list of characters because many are identified as "XXX the son of Saul." Since Saul is dead in 2 Samuel, his name could be removed and a different network would be drawn.)

Some parts of the image puzzle me: for 2 Sam, why is Bathsheba not connected to Uriah, her husband (before David impregnated Bathsheba and ordered Joab to have Uriah killed)?

I've seen graphs that show the strength of the connection between nodes by making the line thicker. I'd like to do this for these maps -- a future project!