stylo R package

Fork me on GitHub

The suite of stylometric tools, so far in the form of separate scripts, has been recently ported to a regular R package. Once installed, it provides a number of functions that can be invoked from inside the R console. 

If you find the package "stylo" useful and plan to publish your results, please consider citing the following paper:
Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. R Journal8(1): 107-121, url: https://journal.r-project.org/archive/2016/RJ-2016-007/index.html



The most important functions are:
  •     stylo()
  •     classify()
  •     oppose()
  •     rolling.delta()
  •     rolling.classify()
To see what they are for, type help(stylo), help(classify), etc. Before you start, however, you have first to install the package and to load the 'stylo' library every time you start a fresh R session.


1. Installation

There are two ways of installing "stylo": (1) from CRAN repository, (2) from a locally downloaded file

1.1. Installing from CRAN:
  • launch R; make sure you are connected to the internet;
  • type: install.packages("stylo")
  • choose you favorite CRAN mirror, or a nearest one (a window will pop up); click OK.
1.2. NOTE (Mac OS users): the package "stylo" requires X11 support being installed. To quote "R for Mac OS X FAQ" (http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html): "Each binary distribution of R available through CRAN is build to use the X11 implementation of Tcl/Tk. Of course a X windows server has to be started first: this should happen automatically on OS X, provided it has been installed (it needs a separate install on Mountain Lion or later). The first time things are done in the X server there can be a long delay whilst a font cache is constructed; starting the server can take several seconds." The newest versions of R (>3.1.0) seem to have Tcl/Tk support out-of-the-box, though.
ANOTHER NOTE: When you install a recent version of R on OS X (e.g. Mavericks), you might run into encoding errors when you start up R (e.g. "WARNING: You're using a non-UTF8 locale" etc.). In that case, you should close R, open a new window in Applications > Terminal and execute the following line:
    defaults write org.R-project.R force.LANG en_US.UTF-8
Next, close the Terminal and start up R again.

1.3. Installing from a local file:
  • download the package from here;
  • save the file anywhere on your computer where you will be able to find it;
  • launch R;
  • set working directory to the folder where the downloaded file is:   setwd("i/hope/i/can/remember/where/it/was/")
  • install the package:    install.packages("stylo_0.6.5.tar.gz", repos = NULL, type = "source")
1.4. NOTE: the "stylo" package requires a few standard R packages to be installed. When installing from CRAN, the dependencies are downloaded automatically; otherwise, you have to install them manually. Type (or copy-paste) the following lines:
    install.packages("tcltk2")
    install.packages("ape")
    install.packages("class")
    install.packages("e1071")
    install.packages("pamr")
    install.packages("tsne")

2. Usage

2.1. launch R;

2.2. activate the package (whenever you start a fresh R session):
  library(stylo)
  this activates all functions of the package, including those for the major methods: stylo(), classify(), rolling.delta(), rolling.classify(), oppose().

2.3. assuming that the working directory contains some corpora, try:
  •   stylo()
  •   classify()
  •   oppose()
  •   rolling.classify()
  •   rolling.delta()
2.4. the package contains many more functions (as listed below), but in most cases they are not intended to be invoked directly by users. The functions are:
  •       assign.plot.colors
  •       classify
  •       define.plot.area
  •       delete.markup
  •       delete.stop.words
  •       dist.argamon
  •       dist.cosine
  •       dist.delta
  •       dist.eder
  •       dist.simple
  •       gui.classify
  •       gui.oppose
  •       gui.stylo
  •       load.corpus.and.parse
  •       load.corpus
  •       make.frequency.list
  •       make.ngrams
  •       make.samples
  •       make.table.of.frequencies
  •       oppose
  •       parse.pos.tags
  •       perform.culling
  •       perform.delta
  •       perform.knn
  •       perform.naivebayes
  •       perform.nsc
  •       perform.svm
  •       rolling.classify
  •       rolling.delta
  •       stylo.default.settings
  •       stylo.pronouns
  •       stylo
  •       txt.to.features
  •       txt.to.words.ext
  •       txt.to.words
  •       zeta.craig
  •       zeta.chisquare
  •       zeta.eder
Additionally, the package in ver. >0.6.1 comes with three datasets:
  •       data(novels)
  •       data(galbraith)
  •       data(lee)
Type help(galbraith) etc. to see what they contain.

3. Documentation and help

3.1. A manual to stylometry using "stylo" is available here.

3.2. each function is supplemented by a manual; to get some help, type:
  help(load.corpus)
  help(stylo)

  etc.

3.3. a pdf version of the (automatically generated) manual is available here.

3.4. help pages routinely contain some examples: refer to them if you want to understand what a particular function does. They are ready to be copy-pasted to an active R console.

4. Advanced

4.1. Some users might be interested in using the "stylo" library in batch mode, or to be able to pass any options directly to the function stylo() and/or classify() from the command-line. To exemplify: if you want to prepare a number of plots quickly, you do not need to use the GUI. Instead, just type:
   stylo(gui = F, mfw.min = 1000, mfw.max = 5000)

4.2. The biggest advantage of this approach is that the stylo() function can be embedded in a tailored script, written either in R and launched from inside R, or in any other scripting language. For instance, it can be invoked by cron on a Unix server, if one has to perform 1M computations, and the server resources are available on weekends and holidays.

4.3. More examples: see the manual here, section 9. 

Ċ
Maciej Eder,
Jun 19, 2017, 11:44 AM