The schedule for the F21 semester is broken down week by week below. Registered students in the course will receive a link via email for access. Anyone interested in the course, but not yet in Albert, can contact the instructor directly.
Materials:
The course learning materials are composed of numerous online articles & tutorials, interdisciplinary writing from the blogosphere, videos, digital projects in addition to traditional academic readings. There will be no books for purchase. Students will have access to ebook chapters available through NYU Libraries.
For this course you will need to make some accounts and download some software. To make the accounts, you can use your NYU account or create a "burner" account for the class. We will make use of AntConc, AntCorGen, Atom (or other text editor) | Zotero 5.0 | Zotero connector, as well as a number of web-based tools. You will also be assigned access to RStudio Cloud within a few weeks. If you are familiar with RStudio and already have it downloaded, you can use it. For beginners, I recommend the cloud based version. These resources are all at no cost to you.
Week 1 (31 Aug, 2 Sept) - synchronous Zoom
Introduction, Reviewing course components & syllabus, creating your own Google Site
Materials: Speed Reading (Tonight Show) (6 mins) | How to Speed Read (Ferriss) (9 mins)| "How Many Books Will You Read Before You Die?" | "The beginning of silent reading"
Questions: How do you read? how do people read in a country where you have lived? What might it mean to "read like a computer"? What is the tension in reading between speed vs care?
Please register the address of your site here.
Week 2 (7 Sept, 9 Sept) - synchronous Zoom
Can Computers Read?
Materials: Can Computers Read (Literature)? (Kestemont / Herman) | The Mechanic Muse: What is Distant Reading? (NYT) | What is Fan Fiction? | Harry Potter fandom | The Pitfalls of Using Google Ngram | A Fan Studies Zotero Bibliography | Fanfiction.net
Explore: Google n-gram vs. BookWorm (extra: tutorial for HT+Bookworm and "My Secret Editing Weapon")
Download: Laurence Anthony’s AntConc
Search: Look for a tutorial about AntConc on Youtube, learn how to do something with it and show us on Thursday.
In-Class: Harry Potter fan fiction with AntConc (source: AOOO) (Thursday)
Response 1 (on your site) - see rubric and guidelines (due 10-21 September) Using AntConc, explore the Harry Potter fan fiction texts we worked with. What are you able to observe by looking at a list of words? How are they different from text to text? How can you use the concordance function, as well as the clusters / n-grams and collocation to gain insight into the different texts? Are there certain expressions which begin in Rowling and are popularized by fan fiction? What things seem to come about only in fan fiction? If you would like to try a corpus different from Harry Potter, try the corpora of African-American or colonial South Asian literature assembled by Amardeep Singh. You will have to figure out how to retrieve them from GitHub. Be sure to include illustrative visuals (screenshots) in your blog which allow your reader to follow your investigative process.
Week 3 (14 Sept, 16 Sept) synchronous Zoom
Reading All of Anything | Corpora | Text Mining
Skim: Introduction to Text Mining | About open access journals | About Plos One Open Access Journals | About a paywall | Building the Invisible College (ch4, Crymble) | AntConc tutorial | AntCorGen tutorial
Download: Laurence Antony's AntConc (3.5.9); AntCorGen (1.1.2)
Watch: How can we make our own corpus? (10 mins) | AntCorGen - Getting Started (6 mins) | Finding collocations (8 mins) | Finding clusters (6 mins) | Overview of AntConc 3.5.9 functionality (16 mins)
In-class: Reading a corpus of Harry Potter Fan Fiction with AntConc (Tues) | Reading a corpus of specialized academic language with AntCorGen (Thurs)
Discuss: Thinking about Crymble's notion of the "invisible college" what would you like to learn how to do that university is not currently teaching you? (Tues)
Week 4 (21 Sept, 23 Sept) in-person (A2 004)
Presentations | Voyant
Prepare: Mini-presentation - 21 Sept (5 mins maximum, in pairs, sign up here) - Choose a specialized field of knowledge and prepare a small analysis of it using AntCorGen. Use a maximum of 3 slides in Google drive so that you can present whatever the form of delivery we are in. (Tues)
Watch: Introduction to Text Mining with Voyant Tools (23 mins)| Reading all of Jane Austen with Voyant Tools (11 mins) | About Project Gutenberg (6 mins)
Skim: About Jane Austen | What is Project Gutenberg? PG Mission Statement| Project Gutenberg Blocks Access in Germany
In-class: embedding a visual into your site | Reading a book from Project Gutenberg with Voyant (Thurs)
Response 2 (due 24 Sept - 4 October) Follow the steps to download AntCorGen described in the video above. Create a folder and choose 100+ journal articles in a specialized field of interest to you following Anthony's instructions. Use AntConc to analyze the small corpus that you have created. What are the most common words? specialized words? clusters? collocations? Why might you want to read hundreds of journal articles at the same time? How is discipline-specific corpus creation different from, say, popular fiction or fan fiction?
Week 5 (28 Sept | 30 Sept) (in person, A6 009).
Rstudio Cloud | Reading Corpora from Project Gutenberg with R
Sign up: get your free account in RStudio.Cloud. Once you have your account, please submit the email address and name that you used for it so that I can add you to our class space. Use the form here. Sign up for RStudio Cloud even if you would like to use an instance of R on your own machine.
Skim: About Qur'an | The Watsons | Pride and Prejudice
Watch: Part I and Part II (15 mins) | Intro to RStudio Cloud (6 mins) | Introduction to R and Tidyverse (12 mins) - once you have an RStudio Cloud account you can actually do all of what this video teaches you to do by creating a a "project" | Two short videos I made about using RStudio Cloud. Part 1 (11 mins) and Part 2 (12 mins)
Notebooks : Reading Jane Austen with R | Reading Qur'an with R | Detecting Authorship (Tues)
Skim: Who were Leigh Brackett? E.E. Smith? Lee Hawkins Garby? | Gender and Cultural Analytics: Finding or Making Stereotypes?
Notebooks : RGutenberg with Science fiction | RGutenberg with Prolific Writers (Thurs)
Week 6 (5 Oct | 7 Oct) (in person, A6 009).
Science fiction notebook demo (Tues)
Sentiment analysis with RGutenberg and lexicons (Thurs)
Sentiment analysis, sometimes called opinion mining, attempts to extract from texts affective or subjective information from data. The kind we will look at here is a somewhat simple one: the automated extraction, classification and interpretation of sentiment from texts using some techniques in R. It is one of the ways we might say that we can “read like a computer.” Sentiment can also be derived from image or even biometric data.
Read: We will look at sentiment using a hand-curated list of words that are considered to be negative or positive, called a lexicon. The tidytext package that we have used come pre-packaged with three different lexicons, described briefly here.
Sentiment analysis in ecommerce
Want to know if the languages you know have a sentiment lexicon? Check out this dataset at Kaggle: Sentiment Analysis in 81 Languages
Notebook: Detecting Sentiment (Austen and Gutenberg)
Watch: Data Lit (sentiment analysis) (starting 1:00) (4 mins) | How to See Sentiment on Twitter (5 mins)
Assignment 1 (in pairs) : see guidelines and rubric (due 8 - 20 October)
Gender and Reading Like a Computer. A notebook also allows you to pick science fiction authors of your own from Project Gutenberg and compare them one by one looking for the "most distinctive words" and visualizing those. you need to choose four works of English-language science fiction, (NB: not four authors). Browse a little bit and choose your four text corpus based on the authors that seem the most interesting to you, their themes and life trajectories. The Internet Speculative Fiction Database link in column B of the table below can be helpful in identifying themes that could be of interest (e.g. Gallun's themes include Martians, plant men, crystal folk, Pluto colony, silicon plague, etc). As you will recall the scifi/speculative fiction world is full of lots of themes. Do themes and most frequent words map onto the gender identification of the authors?
NB: You can also choose another angle besides gender for this assignment if you like.
Take a look at this article and put your assignment in dialogue with it: Gender and Cultural Analytics: Finding or Making Stereotypes?
Week 7 (12 Oct | 14 Oct) Zoom (Tues), in person (Thurs)
Meeting Dorian Paul Rogers, producer of Rooftop Rhythms (Tues) - at the regular Zoom link
Wrap up of First Half of Term (Thurs)
Break!
Week 8 (26 Oct | 28 Oct) in person (A6 009)
Continuing Recap | Building a Corpus from Handwritten Archives with AI
Recap discussion (Tues)
Skim/watch: Transkribus makes breakthrough in understanding medieval texts (Euronews) Transkribus in 10 Steps | A (brief!) introduction to OCR in Digital Humanities
Read: Automatic Transcription of BnF ms fr 24428 with Transkribus | The eScriptorium VRE for Manuscript Cultures
Discussion with Suphan Kirmizialtin (NYUAD) on Handwritten Text Recognition (HTR) (Thurs)
Trial of OCR in Google Drive
Week 9 (2 Nov | 4 Nov) in person (A6 009)
Building a corpus from pdfs with Tesseract | Internet Archive | library databases
What is the Programming Historian (think: invisible college)? What is the command line?
Skim: Working with Batches of PDF files (Mähr) | Intro to Powershell (Windows) and Bash Command (ioS) | documentation for ocrmypdf
Explore: NewsEye | Viral Texts Project (two projects which use quite messy OCR'd data)
Download: Tesseract (instructions in Mähr for iOS or Windows) -- if you are unsure about the command line or encounter issues, we can work on this slowly in class.
Assignment 2 (alone or in pairs) : see rubric (due 30 Nov) Using pdfs build a corpus of 5 articles or texts dealing with a similar subject. Using the tutorial, perform the OCR and extract the textual layer from 5 articles. With Acrobat Pro, run OCR on the very same texts. Use any of the methods from the course to provide an analysis of the contents. Do you notice a difference between the OCR quality between the two tools, tesseract and Acrobat Pro? If so, what are the differences? Does the OCR quality make a difference for the kind of analysis you carry out?
Week 10 (9 Nov | 11 Nov) in person (A6 009)
Building a corpus from videos with Stream | Watching some of Fall 2020-Spring 2021 Rooftop Rhythms
Skim: What are speech-to-text algorithms? | Accent bias is an unchecked sign of racism in the workplace | If we all end up sounding like Americans, you can probably blame voice assistants (Olyeinka)
Watch: Voice Recognition Elevator in Scotland | Watch one of the sessions of Rooftop Rhythms (full list here)
Skim listen: The Late Wire EP 5 (interview with Raffy Akinwande) – Nigerian social podcast | Iraq Matters#30: (interview with Moussa AlNasari) Remembering Mutanabbi Street 10 Years Later | SG Explained (Willy, Elliot, Rovik) Talking about Racism – Singaporean “regular guys” podcast | Scotland Outdoors, Mark and Euan Visit the Mysterious Goblin Ha’ – BBC Radio Scottish nature podcast | Chini Ya Maji podcast (interview with Don Okoth) – Kenyan podcast on startup culture | Cornish Soccer Talking Football (interview with Andy Watkins) – football podcast from the SW United Kingdom | AWR Colloquial English Sudan – a Christian podcast from Sudan
Discussion: What does accent bias have to do with AI/STT? Which of the podcasts above would STT do best with? What potential issues will we have with STT and Rooftop Rhythms?
Demo: Topic Models of RR (two explanations here and here)
Response 3: (due 25 November) Choose a podcast above that STT do best with? Try some snippets and see how it does, or choose your own sound sample. What are the issues of equity and racism bound up in STT technologies?
Week 11 (16 Nov | 18 Nov) in person (A6 009)
Choosing our corpus, Generating Transcriptions
Skim: Siri Disciplines in Your Computer is on Fire (Lawrence)
Watch: Trimming audio with QuickTime Player (5 mins) | How to upload an mp3 to NYU Stream and request subtitles/captions (correct, and download) (2 min) | My explanation of how you can find the text of your subtitles/captions for viewing or reuse (4 min)
Requesting Transcriptions in Stream | Correcting Transcriptions | Trimming Audio or Trimming Text?
Analysis of some samples - what are overarching themes of Rooftop Rhythms?
Data: Rooftop Rhythms videos and transcriptions
Week 12 (23 Nov | 25 Nov) LAB : Reading Rooftop Rhythms in person (A6 009)
Final project work
share a paragraph of what you are working on for Dorian (by 1 December)
Week 13 (7 Dec | 9 Dec) LAB : Reading Rooftop Rhythms in person (A6 009)
Final project work
Week 14 (14 Dec) (Zoom)
-workshopping with Dorian Rogers / wrap up
-Response 4: A letter home to your older relative.