Human Centered Computing Blog - History Makers Project

History Makers: Handwritten Letters Project

Phase 1: Learning about History Makers
Handwritten Letters

During our meeting with the History Makers group, I gained a deeper understanding of their work, which provided insight into both the challenges and possibilities associated with working on handwritten documents.

What the Team Does

The group meets every morning from 8:05 to 9:00, dedicating one day each week specifically to their projects. They currently have two main projects:

The Williams Letters (1880s) — led by Alex and Kate

The Bessie Irvine Letters (1904–1917+)

The transcription team (Norfie, TK, Emily, and Steven) is responsible for rewriting the contents of these handwritten letters into digital form. They usually have the letter open on one side of the screen and a Google Doc on the other, carefully typing out every word. Their audience includes historians, history majors, and anyone interested in understanding how people wrote and communicated during that time period.

Challenges in Transcription

One major insight I gained was how difficult it is to read and transcribe older handwriting. The Bessie Irvine letters are especially challenging due to inconsistent handwriting, faded ink, and time-specific slang. Even the Williams letters, which are more consistent, require attention to details like misspellings and outdated grammar. The team often has to look up unfamiliar terms or historical references.

Another challenge is scale - there are many letters to transcribe, and the process is very time-consuming and prone to human error.

Where Technology Fits In

One team member has tried using Google Lens to transcribe a letter before; however, that wasn't very effective.

Research in computer vision shows that handwriting recognition systems rely on machine learning models trained on labeled images of text. These models can identify letters and words visually, but they often struggle when the handwriting is inconsistent or uses outdated vocabulary. Combining OCR/HTR with NLP techniques might help interpret the meaning of the text once the handwriting is digitized.

My “Shadowing” Experience

To better understand their process, I tried transcribing a short handwritten letter myself. Initially, I underestimated the difficulty of it. It took several minutes just to decipher a few sentences, especially since I don't know cursive very well. I found myself cross-referencing letters and comparing patterns to guess certain words. This small exercise made me realize why AI still struggles; humans rely on context, tone, and educated guesses that algorithms don’t easily replicate.

Reflections and Takeaways

From this experience, I learned that while technology has great potential to assist with transcription through OCR and computer vision, human insight remains irreplaceable for interpreting nuance and historical context. The best approach might be to combine both: using AI tools to handle basic recognition and formatting, while leaving interpretation and verification to human experts.

Phase 2: Background Research
Researching OCR/HTR used by LLMs

For further context on the subject, my group and I researched articles, videos, and additional readings. I was responsible for the second half of the academic research article "Transkribus as a case study" and the video "How to make an AI read your handwriting (LAB)."

Reflections

Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study

The second half of the article emphasizes how tools like Transkribus are creating faster and more accessible ways to work with handwritten texts. Transkribus allows the user to create custom models, which has led to a huge growth and activity in the Transkribus community. Thousands of users and dozens of institutions contribute over a hundred new HTR models each month. This demonstrates that automated transcription is already making a significant impact in enabling researchers to connect more quickly with historical materials across various languages and time periods. Overall, this section made me reflect on how HTR technology is not just improving efficiency but actually changing the way scholars approach and interact with archival sources.
The ability to create custom models intrigued me; therefore, I decided to test out Transkribus myself. I began by using one of the provided models for simplicity; however, I was disappointed fairly quickly. The software was not very user-friendly and had too many features, which made the actual transcription process too demanding. This experience made me realize that in this case, simple is better.

How to make an AI read your handwriting (LAB): Crash Course Ai #5

The video provided a crash course on how exactly AI is able to read handwritten data. It was interesting to see how details like normalizing pixel values, centering letters, and matching color schemes turned out to be just as important as the neural network itself. I was surprised to see that even after training on over 145,000 examples, the AI still struggled with quirks like blank pages or ambiguous letters, revealing the limits of machine learning when faced with messy, real-world handwriting. Overall, the demo made the process feel both powerful and approachable, highlighting how AI can meaningfully assist with digitizing handwritten text while still requiring human guidance and interpretation.

Language models: past, present, and future

The article “Language Models: Past, Present, and Future” by Hang Li begins by providing readers with key insights into the evolution of language models. Li highlights the impact of pre-trained language models, which have significantly improved performance in areas such as translation, summarization, and question answering. Although we see improvements, the article also emphasizes the challenges that remain, such as cost and the energy used for these large models, like BERT and GPT-3. Towards the end, Li’s argument focused on how the future of language modeling lies not only in improving these models but also in making them more human-like.

Overall

This research helped solidify what we are looking for in a technology. Given our clients' needs and goals, it is best to stick with a simple, straightforward automated software that allows for easy human intervention and guidance to ensure accuracy.

Phase 2: Competitive Analysis
Handwritten Letters

What Exists and Inspirations

There are many technologies that perform HTR, but a trend I noticed across all technologies was that, in one way or another, they all still struggle when the handwriting is inconsistent or uses outdated vocabulary.

Keeping our audience in mind, the main factors we used to prioritize technologies were the requirement of no prior programming knowledge, being free to use, facilitating collaboration, and having a minimal learning curve.

What’s Lacking From These Technologies

FromThePage - ensures accuracy; however, it is not automated. Each document is transcribed by people; therefore, feedback is not immediate

LLMs like ChatGPT and Gemini - regular non-pro accounts have limited daily uploads (3 and 10, respectively)

Google Vision Cloud - complex initial setup

MoonDream - not general purpose, small model, requires fallback to larger, more robust models like Gemini 2.5 Flash

Takeaway

Unfortunately, many technologies are behind a paywall, others have overly complex interfaces and services that might be overwhelming for high schoolers, and few platforms allow for real-time collaboration for team annotation.

Phase 2: Technical Research
Researching Google Gemini

As a team, we decided to conduct research on Google Gemini to gain a better understanding of what it entails. My research responsibilities involved learning about the exact technologies used by Google Gemini to perform OCR. Below are my findings:

Technologies or Libraries Gemini uses to perform OCR:

OCR capabilities are powered by advanced machine learning models and are derived from the same technology behind Google's enterprise solutions, which include:

Google's Deep Learning Models: The core of the technology is built on neural networks and deep learning algorithms trained on massive datasets. This allows for superior accuracy and contextual understanding compared to older, rule-based OCR systems.
Google Cloud Vision API: This powerful API is often used for general-purpose text extraction from various images, including photographs, street signs, and more.
Google Document AI (DocAI): For complex documents like invoices, forms, and dense text documents, Google uses the DocAI suite, which includes an advanced Enterprise Document OCR that leverages decades of Google research. This system is designed not just to extract text, but to understand the document structure (tables, paragraphs, key-value pairs).
Tesseract OCR: While Google is a major contributor to and former maintainer of the open-source Tesseract OCR engine, the cutting-edge AI models used in products like Document AI and the Cloud Vision API are generally more advanced for complex, modern tasks.

Handwritten and Cursive Documents

Capabilities for Handwritten Text Recognition (HTR) - a specialized form of OCR:

Handwritten (Block/Print): Gemini's underlying models are quite good at recognizing standard, non-cursive handwritten text (like neatly printed notes). Google's enterprise OCR is trained to recognize handwriting in over 50 languages.
Cursive: Recognizing cursive is significantly more challenging due to the connected, highly variable nature of the letters. While the technology can often handle clean and legible cursive, the accuracy may be lower than for printed or neat block handwriting. The latest AI models are continually improving in this regard by leveraging a deeper understanding of language and context to infer the meaning of connected strokes.

Takeaway

Google Gemini uses state-of-the-art AI developed by Google, and can handle both printed and handwritten text, though the complexity of the handwriting (like very messy cursive) can affect the accuracy.

Final Recommendation
AI automation with human interpretation

After evaluating multiple tools, our group recommends using Google Gemini, combined with a human final check, as the best fit for the History Makers project.

Why Gemini?

Technical Capabilities
Gemini's state-of-the-art AI capabilities combine top-performing OCR/HTR solutions like Google Cloud Vision, Document AI, and Tesseract with advanced deep learning models trained on massive datasets.

LLMs like Gemini are continuously being improved and maintained, making them more reliable than other smaller technologies. In fact, Mark Humphries, Professor of History at Wilfrid Laurier University, tested out the transcription capabilities of Google Gemini's newest model (3.0) and compared its performance to that of expert human performance. The results improved by 50-70% compared to what was achieved by Gemini-2.5-Pro. This new model is set to release later this year.

Easy To Use
Google Gemini offers a very user-friendly UI, allows direct uploads from Google Drive (which is where the letters are currently stored), and offers easy and simple collaboration between the user and the AI model.

Privacy Concerns
I investigated the options available to limit the data that Google saves. By navigating to Gemini's advanced settings, users can control how often conversations are automatically deleted and whether they'd like their conversations to be used to train Gemini models. However, the best option I found was to use Gemini's Temporary Chats. These chats behave like any other, except they are not saved in the model's memory and will automatically delete after 72 hours. This feature is included in every account, regardless of membership level.

Limitations Are Not Detrimental
Currently, Gemini's free account allows for 10 daily uploads; however, the high schoolers only work on 1-2 letters per session, each being about 4-5 pages. Therefore, this limitation isn't detrimental, especially if all the letter pages are included in one PDF.
Still, to combat this, a team member of mine created a quick web app that uses Google Gemini's developer API. Using this Gemini wrapper, the limitation is now based on tokens rather than daily uploads.
Web App Link: https://madisonarchivr-434336472210.us-east1.run.app/

Google Gemini offers free pro memberships to all college students; however, since our clients don't meet that criteria, my team and I decided to compare the responses of a regular account and one with a pro membership. Below are the two responses side-by-side. There are a few differences between the two responses; however, considering the scale of the project and the minor errors, they will not have a significant impact on the project's efficiency.

Conclusion

For the transcription of letters, a super task-focused software is not necessary - especially for high school students with no prior programming knowledge who perform this task once a week for a few hours after school - a general-purpose solution is sufficient.

I found that while AI is powerful in recognizing patterns visually, it still cannot fully replace human insight for interpreting context, emotion, and historical nuance. Additionally, it is essential to acknowledge that its accuracy decreases with messy cursive and historical spelling variations, necessitating manual review. I recommend the initial draft be done by Gemini, followed by human review and correction, if needed, to ensure an accurate transcription.
Regardless of the approach taken, proofreading is required because messy handwriting is difficult to decipher even for humans; therefore, the accuracy isn't necessarily a fault solely stemming from the technology. A blended approach leverages the efficiency that cutting-edge technology enables without compromising accuracy.

References

Crash Course AI: How to Make an AI read your Handwriting.

How to make an AI read your handwriting (LAB) : Crash Course Ai #5

Google. (2024, April 22). Gemini (2.0) [Large language model]. https://g.co/gemini/

Hang Li. 2022. Language models: past, present, and future. Communications of the. ACM 65, 7 (July 2022), 56–63. https://doi.org/10.1145/3490443

https://cacm.acm.org/research/language-models/

Humphries, Mark. "Has Google Quietly Solved Two of AI’s Oldest Problems?" (2025, October 17). Generative History. https://generativehistory.substack.com/p/has-google-quietly-solved-two-of

Muehlberger, Guenter, Louise Seaward, Melissa Terras, Sofia Ares Oliveira, Vicente Bosch, Maximilian Bryan, Sebastian Colutto et al. "Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study." Journal of documentation 75, no. 5 (2019): 954-976. https://www.emerald.com/jd/article-pdf/75/5/954/1398560/jd-07-2018-0114.pdf

OpenAI. (2023). ChatGPT [Large language model]. https://chat.openai.com/

Transkribus Handwritten Text Recognition Overview. https://www.reddit.com/r/computervision/comments/1ctf0jh/2024_review_of_ocr_tools_extracting_text_from/

Phase 3: Presentation and Demo

Handwritten-Letter

Materials Presented on Demo Day

Google Gemini: gemini.google.com/app

Web App Link (MadisonArchivr): https://madisonarchivr-434336472210.us-east1.run.app/

How to use MadisonArchivr:

Upload file you want transcribed, then click "Transcribe Handwriting"
After the loading screen, it'll redirect to a page with the transcribed text in a text box and an image preview of the image that was transcribed
There are two buttons below text box with transcribed text: "Copy" and "Edit
- - Copy - copies everything in the text box to your clipboard so you can paste it where you want it
  - Edit - textbox will become interactable, allowing the user to make edits directly in the textbox
Under main component, there is a back arrow labeled "Transcribe more images with MadisonArchivr" that will take you back to the first page to transcribe a new document

Note: if the loading screen takes too long or if it is unable to transcribe the document, refresh the page and try again

Closing Reflection

Through this project, I gained a deeper understanding of myself and the challenges of working in the CS field, particularly when collaborating with a real client. I realized how important it is to prioritize the client’s needs over the excitement of new or advanced technology. Throughout the process, I often came across impressive tools, but I had to remind myself that the goal wasn’t to find the “best” technology - it was to find the technology that best suited the client’s specific situation. Shifting my perspective to think from the client’s point of view helped guide my decisions and made the entire project more purposeful.

This experience also opened my eyes to a side of computer science I hadn’t fully considered before. I didn’t expect to enjoy the problem-solving involved in researching solutions and building a demo tailored to someone else’s needs, but it turned out to be both valuable and rewarding. Personally and academically, it made me more open to exploring this type of client-focused work in the future.

Page updated

Google Sites

Report abuse

History Makers: Handwritten Letters Project

Phase 1: Learning about History Makers Handwritten Letters

Phase 2: Background Research Researching OCR/HTR used by LLMs

Reflections

Overall

Phase 2: Competitive Analysis Handwritten Letters

Takeaway

Phase 2: Technical Research Researching Google Gemini

Technologies or Libraries Gemini uses to perform OCR:

Takeaway

Final Recommendation AI automation with human interpretation

Why Gemini?

Conclusion

References

Phase 3: Presentation and Demo

Closing Reflection

Phase 1: Learning about History Makers
Handwritten Letters

Phase 2: Background Research
Researching OCR/HTR used by LLMs

Phase 2: Competitive Analysis
Handwritten Letters

Phase 2: Technical Research
Researching Google Gemini

Final Recommendation
AI automation with human interpretation