Discussion of scifi notebook.
Demo of style detection using class submitted texts.
Discussion of distinctive words vs. function words
For today's asynchronous class on working with the Stylo package for R, there are two notebooks located in RStudio Cloud.
You may want to review the short video on Making a Corpus here (11 mins).
Notebook #1 (Stylo)
The first notebook will replicate what we did in class and what I spoke about in the Nov 3 thread in Google Chat just after class. I have cleaned up the corpus that you submitted before last class and removed all non utf8 encoding. This is an important step that we missed and made the stylo package not operate correctly.
For a quick discussion of naming conventions and getting a "messy" corpus into shape, check out last semester's video here (2 mins). Be on the lookout for the pandemic self-done lockdown haircut!
I have attached the clean, well formatted files you provided me to the notebook so that we can use them directly in RStudio Cloud.
Open up the notebook "Stylo" and read more there.
Notebook #2 (Stylo with GutenbergR)
The second notebook will allow you to replicate what we did in the scifi notebook, that is, to choose a corpus of texts from Project Gutenberg that are of interest to you and for which studying MFW usage would be appropriate. The notebook begins with an example of the Bronte sisters and Jane Austen.
Watch last semester's video that walks you through the Stylo with GutenbergR here before you run the notebook (11 mins). NB: that the notebook is slightly different since we are working in RStudio Cloud, but the workflow is basically the same.
If you would like to watch the small video I have made about structuring the metadata for this exercise here. (4 mins)
On the difference between cluster analysis and consensus trees using the Bronte/Austen corpus, see here.
Try notebook #2.
Now imagine your own corpus from Project Gutenberg , build a metadata table and try out the notebook. To keep the CPU usage reasonable on RStudio Cloud, keep your corpus to a maximum of 10 books from PG.