ALICE: Aggregate Line Inspector & Collaborative Editor

Research & development

Research for ALICE began in 2017, when the Zooniverse team was in the early stages of the Transforming Libraries & Archives through Crowdsourcing project. As we created the new collaborative transcription tools and updated our in-house aggregation code, we realized that, even with these new tools, the process of working with data output from these projects was still incredibly difficult.

The main pain points for project builders were data aggregation and working with Project Builder data exports (.json embedded in a .csv). ALICE facilitates the process of working with crowdsourced transcription output by 1) automating the data aggregation process, and 2) expanding the data export types to include user-friendly formats such as plain text.

Development on ALICE began in May 2019, and ended in September 2020. As of writing, we are focused on maintenance, collecting feedback, and responding to bug reports, as well as continuing to update documentation and create additional resources for potential users via workshops, demos, tutorial videos, etc. Since development ended in September 2020, over a dozen teams have used ALICE, with another ten projects currently in development.

ALICE: a walkthrough

NB: this walkthrough assumes that project builders have already created and launched a Zooniverse project with the required settings for using ALICE.

Project builders log into https://alice.zooniverse.org using their Zooniverse credentials.

Once logged in, then choose from a list of available projects. The list shown on this page will depend on how many ALICE-compliant projects their username is associated with on the Zooniverse platform.

For a given project, project builders can then choose from a list of ALICE-compatible workflows.

Within each workflow, they can select from groups of subjects. Only completed (retired) subjects will appear in ALICE. These indexing parameters are based on metadata uploaded to the associated Zooniverse project.

The group page shows a list of subjects, including information on when each was last edited, and by whom. This page also offers information on the review status of a subject, and its number of pages and transcribed lines.

The page-level view shows the original image, the line-by-line annotations (left), and the approved transcription for each annotation (right). The default view for each line is the automatically-generated aggregation based on transcriptions submitted by volunteers.

Clicking into a line opens a modal that shows the individual transcriptions submitted for that given line. Editors can replace the auto-generated aggregate transcription with that of an individual, or write in a new line. They can also delete the line, if needed, or flag it for review by a collaborator. A filmstrip viewer on the bottom of the page allows editors to view additional page data, if present. Once the subject has been reviewed, editors can mark the subject as 'Approved' to indicate no further review is needed.

Approved images can then be downloaded from the platform. For each subject, the download will include a .zip file containing line by line transcription and metadata (.csv); a plain text file containing the approved text for each line (.txt); and raw, unparsed transcription data (.json)

Learn more

Read more about how to use ALICE (including advanced features like editing aggregation parameters) here.

Watch tutorial videos on how to use ALICE here.


Next section:

Technical Infrastructure & Data Roadmap