ARLO Documentation for Humanists

Ready for Getting Started?

ARLO was developed for classifying bird calls and using visualizations to help scholars classify pollen grains. ARLO has the ability to extract basic prosodic features such as pitch, rhythm and timbre for discovery (clustering) and automated classification (prediction or supervised learning), as well as visualizations. The current implementation of ARLO for modeling runs in parallel on systems at the National Center for Supercomputing Applications (NCSA). The source code for ARLO is open-source and will be made available for research purposes for this and subsequent projects on sourceforge at

ARLO Software and HiPSTAS Project History:
ARLO is a powerful machine learning system for all kinds of media and multi-media projects including images. ARLO began as the result of seed funding from UIUC for what Tcheng then called Nester, a tool for David Enstrom, Avian Ecologist, to use for the analysis of bird calls. Subsequent funding came from the Canadian National Railroad to study the effect railroad development has had on bird migrations in Canada. Recently, ARLO received funding internal to The Cline Center for Democracy for analyzing millions of pages of scanned historical documents to automatically do OCR, segment, and extract metadata from the page images. As well, “Automated Analysis of the Reproductive Response of Two Panamanian Forests to ENSO Climate Variation - A Machine Learning Pilot Study” (on which Tcheng is a Co-PI) was recommended for another round of funding from NSF to support using ARLO for the analysis of images that result from the project’s super resolution microscope. 

HiPSTAS was funded by an NEH Institutes for Advanced Topics in the Digital Humanities grant for $235,000 from September 1, 2012 to September 30, 2014 and again by the NEH Preservation and Access grant for Research and Development for $250,000 from July 2014 to December 2015. In the first nine months of funding for HiPSTAS, we have created an online presence at, developed and implemented an application process for participants and subsequently vetted and chose 20 participants, including humanities junior and senior faculty and advanced graduate students as well as librarians and archivists from across the U.S. who were interested in developing and using new technologies to access and analyze spoken word recordings within audio collections.[1] 

The original implementation of ARLO for modeling ran in parallel on systems at the National Center for Supercomputing Applications (NCSA). The I3 team installed the ARLO software on TACC’s Stampede system, one of the largest computing systems in the world for open science research. As part of HiPSTAS,  the I3 team has developed ARLO’s interface (and documentation) has done some minor interface development to allow humanities users to test the machine learning system and perform exploratory discovery (clustering) and automated classification (prediction or supervised learning) processes as well as visualizations so that we can identify user needs. This development work for HiPSTAS has included limited interface development with ARLO for non-birding humanities users, such as the ability to analyze longer files, adding short keys for play, stop, fast-forward, etc. as well as infrastructure development that allows multiple users to use multiple collections and create separate tags or annotation sets and share them. 

ARLO uses the Django framework, which is written in Python. Used for anything that requires more powerful processing such as spectral matching and clustering and classification, the ARLO backend (written in Java) has been installed on TACC’s Stampede system. TACC has also provided a virtual machine that hosts the front-end, web-facing side of ARLO, which runs on an Apache server. With this installation, we have ingested 27,000 files from PennSound and 150 hours of folklore from BCAH and set up user accounts for the HiPSTAS participants. Currently the backend will, on demand, accomplish computational tasks that require more processing power such as finding patterns across 27,000 PennSound audio files. Reading this amount of data, without precomputing or indexing would take days on a regular system.

[1] Participants are listed here: