Mark Matienzo and Kris Kasianovitz
The Handa Center for Human Rights and International Justice and the Stanford Libraries are working together on three digital archives projects to make the records of international criminal proceedings publicly accessible with funding from the Flora and William Hewlett Foundation. The goal is to develop a platforms for research on war crimes trials, the challenge is in converting a trove of heterogeneous print and manuscript materials to digital form so that they can be effectively discovered, read and analyzed.
Materials from the first project, the Virtual Tribunals Intitiative have already been digitized, ingested into the Stanford Digital Repository, and made available as an online exhibit that enables search of full text materials both across an exhibit and within specific items. (See https://exhibits.stanford.edu/virtual-tribunals).
The WWII Trial Records Collection is another project to preserve and make available to research trial records, in this case from proceedings that took place in China, the Netherlands, Italy, Great Britain, France, Australia, the United States, and the Philippines. According to the Handa Center site, "Most of the trial records in our holdings have never before been reproduced, and for this reason have scarcely been accessible to researchers and practitioners. The collections include copies of some case files that remain under seal in the countries where the trials took place, but which we have obtained through the cooperation of other archival sources." These documents are extremely relevant for war crimes tribunals that are taking place around the world today.
These materials are appropriate to research across a number of fields in law, history, medical ethics, sociology, and anthropology. While having the trial records digitized and preserved in the Stanford Digital Repository is already a tremendous benefit for researchers, the documents remain very difficult to navigate and analyze at scale. The WWII Trial Records are contained in 260 reels of microfilm, containing about 320,000 pages of material. But those pages are not grouped as documents or document sets. As is often the case with this kind of archival material, there are descriptions in aggregate, but not at the document level.
In addition to distinguishing documents, it would be very helpful to track actual cases. Cases evolve in these trials. A trial may be split apart and later merged together. Tracking this evolution would be made much easier if we could identify the identifying case numbers, names, places, and other recognizable entities.
WWII Trial Records: 260 reels of microfilm. About 320,000 page images. The pages are in the process of being OCR'd. The originals are often typed but also include handwritten pages and translation of handwritten testimony.
Recognize discrete documents (currently they are separate pages).
Citation Recognition: UN regulation may be cited three different ways, but it's the same regulation.
Identify people in running text: Defendent. Representation for prosecution, for the defense.
Identify dates and events: This may include references to other trials or other case law.