Textbook / Handbooks for DH and/or learning Python

If you are interested in learning (or teaching a course about) Digital Humanities, finding a good textbook / introduction is a must.

Here are some comments about some textbooks I've examined. . .

Melanie Walsh, Introduction to Cultural Analytics and Python

I think Walsh's Introduction to Cultural Analytics and Python excellent. It provides a good introduction to basic Python (and Jupyter), including discussion of data types, common methods, loops, lists/dictionaries, etc.

Students who want to learn how to program (or to program in Python) will find this clear and straightforward. It is focused on what students will need to know to do DH / analytics with Python, so it is selective. For example, it omits discussion of tuples, sets, while loops, etc. Students who want to progress further into the Pythonic world will learn about these and other issues later.

Anyone who has worked with Python knows that its ecosystem is vast. Walsh does a good job of explaining enough about Pandas, Web-Scraping, etc., to allow students to do basic analysis. I expect those who wish to learn more will find plenty of resources on the web (there are roughly a zillion websites about Python and data science).

Much of the book illustrates how one can gather data and analyze it. I am certain that some students will treat some of the programming examples as black-boxes -- they will copy and paste and execute them without understanding what, exactly, they are doing. But isn't this the way most of us approach, say, Pandas (or Excel)? Rarely do we try to figure out how software does its magic. But most chapters have a "Your Turn!" section at the end, which asks students to use the techniques Walsh has demonstrated on a new set of data, so even if they don't understand what each line of code is doing, they are expected to be able to use it to do something.

Walsh covers a wide variety of topics:

  • analysis with Pandas

  • Web-Scraping (from websites, Twitter and Reddit)

  • Github

  • NLP topics (Sentiment analysis, Topic modeling, Named Entity Recognition, POS-tagging)

  • Network analysis and visualization

  • Mapping and Geocoding

For each of these topics, she has a useful data sets that students can work with.

Finally, she offers interesting comments and critiques about how data collection has been conducted, who benefits from it, power dynamics, and so forth. She encourages students to reflect on how analytics can be done ethically. She also provides her syllabus, which includes links to articles that discuss these issues.

As someone who has been working with Python for years, I skimmed through her programming discussions. I've messed around with webscraping, but am far from expert; similarly, I have used NLTK (some) but not spaCy; I've admired network visualizations, but not worked with them much. So I learned a lot from this book.

This was true even for topics that I have explored (like Pandas and Geocoding). Python offers so many ways to skin a cat that I am always happy to learn about another!

Of the books I've reviewed, this is currently my top pick for use in an undergrad classroom.

Folgert Karsdorp, et al., Humanities Data Analysis: Case Studies with Python ($28 @ Amazon)

This is a beautifully written and produced book, with nice illustrations and examples of programming. Code examples and necessary data can be downloaded from the web, so readers can work through the examples and discussion following the text. This, too, is an excellent resource, especially aimed at scholars in humanities fields. It is, however, aimed at an audience with more advanced programming skills.

The first chapter illustrates why humanities scholars might be interested in data analysis by exploring a database of American cookbooks. By examining ingredient lists from around 1800 through 1935 they are able to show how inventions (like baking powder) and different immigrant groups changed American cooking. It is easy to cut and past their code examples to follow along with their discussion, but they often don't explain the details of what the code is doing: "These lines of code are quite involved. But don't worry about it: it is not necessary to understand all the details at this point. . ." (p. 18). Although they reassure the reader that it will be easier to understand after their introduction to Pandas (in chapter 4), I didn't find this to be the case. I have been working with Pandas for some time and I still needed to stop and carefully examine the variables / data structures the code created to try to understand what it was doing.

The topics discussed include:

  • Parsing data (plain text, CSV, PDF, JSON, XML/TEI, HTML)

    • Example: extracting data from TEI encoded plays by Shakespeare and creating a network map to show how speakers relate to each other

  • Vector Space Models

    • Tools: NLTK, tokenization, regular expressions, creating word matrices, different ways of calculating distances between vectors; introduction to NumPy.

    • Example: examining a corpus of French plays and examining how different genres (comedies, tragedies, tragi-comedies) produce different vectors

  • Processing tabular data

    • Tools: Pandas, matplotlib

    • Example: how children's names have changed from 1880 to 2020

  • Statistics Essentials

    • Tools: NumPy, Pandas, matplotlib

    • Mathy: equations & discussion of ideas like mean, median, dispersion, variance, standard deviation, etc.

    • Example: Who reads novels? Examine geography, income, education, age to see how these variables relate to each other

  • Introduction to Probability

    • Tools: NumPy, Pandas

    • Mathy: equations & discussion of Bayes' rule, probability, degree of belief, distributions, etc.

    • Example: Federalist papers & patterns of word use by Madison and Hamilton

  • Narrating with Maps

    • Tools: Cartopy, Pandas

    • Example: civil war battles; how can information be conveyed via a map?

  • Stylometry (Principal Components Analysis)

    • Tools: scikit, Pandas

    • Example: Authorship of Hildegard of Bingen's letters (written by her or by her secretary?)

  • Topic Modelling

    • Tools: scikit, Pandas

    • Example: Using database of Supreme Court decisions to create different topic maps

This book provides a rigorous introduction to a wider variety of topics than does Walsh. They presume a fairly advanced level of Python skill. Because of this, I would not envision using this book for anything less than a fairly advanced class in DH.

I learned a lot from HDA, but the typical undergrad humanities major -- who hasn't taken much math or programming -- would find it a very challenging text.


Montfort's text is quite different from the two above. It assumes zero (zilch, nada) knowledge about programming and computer operation.

It is very chatty, with asides and comments about programming and the topic being discussed. For example, when discussing how one could write an algorithm to convert from Fahrenheit to Celsius (p. 100-103), M provides a history of each of the temperature scales and discusses the pros and cons of each. (Spoiler: there are few pros to the Fahrenheit system still used by the USA.) While interesting to trivia buffs, I found these digressions more annoying than enlightening, as I had to read through a lot of extraneous material before I got to the main topic: how to write an algorithm.

Appropriately for a book entitled Exploratory Programming, M explains how to install Anaconda, to use Jupyter notebooks (pp. 1-64), and offers a good introduction to Python and structured programming (pp. 65-166), concluding with a discussion of regular expressions.

He explores how folks in the arts and humanities could use these techniques to analyze images, text, and sound. For example, he asks, is it possible for a computer to take a text and determine whether it is poetry or prose? He walks the reader through the process of creating a classifier to do this.

M also provides an introduction to statistics and probability (213-239). This is very helpful, especially since (a) it's possible that his audience is unfamiliar with these topics and (b) a lot of the power of using computers to analyze data derives from statistics.

Overall, it is hard for me to evaluate this book. I come to it not as a complete novice; even if I had never programmed in Python before, I knew the basics. For me, the long (long, chatty) discussions were annoying. I'd prefer a more focused presentation, similar to those of other "intro to python" books I've read.

Would a humanities major who knew nothing about programming find M's style comforting and attractive? Or would they, too, desire a shorter, "just-the-facts" approach? I don't know.

The humanities student would, probably, appreciate M's examples and programming tasks, which do focus on areas of concern to humanities fields. He doesn't, for example, spend a lot of time discussing how to optimize sorting algorithms or to find all the prime numbers between 2 and 10,000.

In short, this book might be exactly what is needed for some courses or readers, but while I might draw on it for specific examples or assignments, I don't plan to use it in my classroom.

Two non-DH focused books that I recommend

The above books all are aimed specifically at digital humanities. But if you are looking for a general intro to python and programming, I recommend the following two books.

Sweigart, Automate the Boring Stuff with Python (No starch press, 2nd ed, 2019) $24

I found Sweigart's text first on the web, where he publishes it for free. But I found it so helpful that I wanted to get him some royalties, so I bought the book, too. The decision was made easier by the book's reasonable price!

Sweigart provides an excellent general introduction to programming, as well as more specialized discussions about how to use Python to read / process Excel, Word, PDF, and other documents. He also explains how to access Google sheets / docs, web-scraping, and provides examples of how you could use python to, indeed, "automate the boring stuff."

Matthes, Python Crash Course (No starch press, 2nd ed, 2019; 3rd ed, 2023) ($21.49)

Some Redditors complained that Sweigart is less orderly in his presentation of Python and recommended Matthes' book. This may be a valid criticism: S doesn't, for example, discuss classes at all. To be fair, none of the DH-focused books do either: classes are probably more useful for folks who are writing the applications that DH-users will use!

M provides a concise and thorough introduction to python, followed by several projects. These are things like a space-invaders type game, visualizing data with Matplotlib, and creating a web application.