During our meeting with the History Makers group, I gained a deeper understanding of their work, which provided insight into both the challenges and possibilities associated with working on handwritten documents.
What the Team Does
The group meets every morning from 8:05 to 9:00, dedicating one day each week specifically to their projects. They currently have two main projects:
The Williams Letters (1880s) — led by Alex and Kate
The Bessie Irvine Letters (1904–1917+)
The transcription team (Norfie, TK, Emily, and Steven) is responsible for rewriting the contents of these handwritten letters into digital form. They usually have the letter open on one side of the screen and a Google Doc on the other, carefully typing out every word. Their audience includes historians, history majors, and anyone interested in understanding how people wrote and communicated during that time period.
Challenges in Transcription
One major insight I gained was how difficult it is to read and transcribe older handwriting. The Bessie Irvine letters are especially challenging due to inconsistent handwriting, faded ink, and time-specific slang. Even the Williams letters, which are more consistent, require attention to details like misspellings and outdated grammar. The team often has to look up unfamiliar terms or historical references.
Another challenge is scale - there are many letters to transcribe, and the process is very time-consuming and prone to human error.
Where Technology Fits In
One team member has tried using Google Lens to transcribe a letter before; however, that wasn't very effective.
Research in computer vision shows that handwriting recognition systems rely on machine learning models trained on labeled images of text. These models can identify letters and words visually, but they often struggle when the handwriting is inconsistent or uses outdated vocabulary. Combining OCR/HTR with NLP techniques might help interpret the meaning of the text once the handwriting is digitized.
My “Shadowing” Experience
To better understand their process, I tried transcribing a short handwritten letter myself. Initially, I underestimated the difficulty of it. It took several minutes just to decipher a few sentences, especially since I don't know cursive very well. I found myself cross-referencing letters and comparing patterns to guess certain words. This small exercise made me realize why AI still struggles; humans rely on context, tone, and educated guesses that algorithms don’t easily replicate.
Reflections and Takeaways
From this experience, I learned that while technology has great potential to assist with transcription through OCR and computer vision, human insight remains irreplaceable for interpreting nuance and historical context. The best approach might be to combine both: using AI tools to handle basic recognition and formatting, while leaving interpretation and verification to human experts.
For further context on the subject, my group and I researched articles, videos, and additional readings. I was responsible for the second half of the academic research article "Transkribus as a case study" and the video "How to make an AI read your handwriting (LAB)."
Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study
The second half of the article emphasizes how tools like Transkribus are creating faster and more accessible ways to work with handwritten texts. Transkribus allows the user to create custom models, which has led to a huge growth and activity in the Transkribus community. Thousands of users and dozens of institutions contribute over a hundred new HTR models each month. This demonstrates that automated transcription is already making a significant impact in enabling researchers to connect more quickly with historical materials across various languages and time periods. Overall, this section made me reflect on how HTR technology is not just improving efficiency but actually changing the way scholars approach and interact with archival sources.
The ability to create custom models intrigued me; therefore, I decided to test out Transkribus myself. I began by using one of the provided models for simplicity; however, I was disappointed fairly quickly. The software was not very user-friendly and had too many features, which made the actual transcription process too demanding. This experience made me realize that in this case, simple is better.
How to make an AI read your handwriting (LAB): Crash Course Ai #5
The video provided a crash course on how exactly AI is able to read handwritten data. It was interesting to see how details like normalizing pixel values, centering letters, and matching color schemes turned out to be just as important as the neural network itself. I was surprised to see that even after training on over 145,000 examples, the AI still struggled with quirks like blank pages or ambiguous letters, revealing the limits of machine learning when faced with messy, real-world handwriting. Overall, the demo made the process feel both powerful and approachable, highlighting how AI can meaningfully assist with digitizing handwritten text while still requiring human guidance and interpretation.
Language models: past, present, and future
The article “Language Models: Past, Present, and Future” by Hang Li begins by providing readers with key insights into the evolution of language models. Li highlights the impact of pre-trained language models, which have significantly improved performance in areas such as translation, summarization, and question answering. Although we see improvements, the article also emphasizes the challenges that remain, such as cost and the energy used for these large models, like BERT and GPT-3. Towards the end, Li’s argument focused on how the future of language modeling lies not only in improving these models but also in making them more human-like.
This research helped solidify what we are looking for in a technology. Given our clients' needs and goals, it is best to stick with a simple, straightforward automated software that allows for easy human intervention and guidance to ensure accuracy.
What Exists and Inspirations
There are many technologies that perform HTR, but a trend I noticed across all technologies was that, in one way or another, they all still struggle when the handwriting is inconsistent or uses outdated vocabulary.
Keeping our audience in mind, the main factors we used to prioritize technologies were the requirement of no prior programming knowledge, being free to use, facilitating collaboration, and having a minimal learning curve.
What’s Lacking From These Technologies
FromThePage - ensures accuracy; however, it is not automated. Each document is transcribed by people; therefore, feedback is not immediate
LLMs like ChatGPT and Gemini - regular non-pro accounts have limited daily uploads (3 and 10, respectively)
Google Vision Cloud - complex initial setup
MoonDream - not general purpose, small model, requires fallback to larger, more robust models like Gemini 2.5 Flash
Unfortunately, many technologies are behind a paywall, others have overly complex interfaces and services that might be overwhelming for high schoolers, and few platforms allow for real-time collaboration for team annotation.
As a team, we decided to conduct research on Google Gemini to gain a better understanding of what it entails. My research responsibilities involved learning about the exact technologies used by Google Gemini to perform OCR. Below are my findings:
OCR capabilities are powered by advanced machine learning models and are derived from the same technology behind Google's enterprise solutions, which include:
Google's Deep Learning Models: The core of the technology is built on neural networks and deep learning algorithms trained on massive datasets. This allows for superior accuracy and contextual understanding compared to older, rule-based OCR systems.
Google Cloud Vision API: This powerful API is often used for general-purpose text extraction from various images, including photographs, street signs, and more.
Google Document AI (DocAI): For complex documents like invoices, forms, and dense text documents, Google uses the DocAI suite, which includes an advanced Enterprise Document OCR that leverages decades of Google research. This system is designed not just to extract text, but to understand the document structure (tables, paragraphs, key-value pairs).
Tesseract OCR: While Google is a major contributor to and former maintainer of the open-source Tesseract OCR engine, the cutting-edge AI models used in products like Document AI and the Cloud Vision API are generally more advanced for complex, modern tasks.
Handwritten and Cursive Documents
Capabilities for Handwritten Text Recognition (HTR) - a specialized form of OCR:
Handwritten (Block/Print): Gemini's underlying models are quite good at recognizing standard, non-cursive handwritten text (like neatly printed notes). Google's enterprise OCR is trained to recognize handwriting in over 50 languages.
Cursive: Recognizing cursive is significantly more challenging due to the connected, highly variable nature of the letters. While the technology can often handle clean and legible cursive, the accuracy may be lower than for printed or neat block handwriting. The latest AI models are continually improving in this regard by leveraging a deeper understanding of language and context to infer the meaning of connected strokes.
Google Gemini uses state-of-the-art AI developed by Google, and can handle both printed and handwritten text, though the complexity of the handwriting (like very messy cursive) can affect the accuracy.
After evaluating multiple tools, our group recommends using Google Gemini, combined with a human final check, as the best fit for the History Makers project.
Technical Capabilities
Gemini's state-of-the-art AI capabilities combine top-performing OCR/HTR solutions like Google Cloud Vision, Document AI, and Tesseract with advanced deep learning models trained on massive datasets.
LLMs like Gemini are continuously being improved and maintained, making them more reliable than other smaller technologies. In fact, Mark Humphries, Professor of History at Wilfrid Laurier University, tested out the transcription capabilities of Google Gemini's newest model (3.0) and compared its performance to that of expert human performance. The results improved by 50-70% compared to what was achieved by Gemini-2.5-Pro. This new model is set to release later this year.
Easy To Use
Google Gemini offers a very user-friendly UI, allows direct uploads from Google Drive (which is where the letters are currently stored), and offers easy and simple collaboration between the user and the AI model.
Privacy Concerns
I investigated the options available to limit the data that Google saves. By navigating to Gemini's advanced settings, users can control how often conversations are automatically deleted and whether they'd like their conversations to be used to train Gemini models. However, the best option I found was to use Gemini's Temporary Chats. These chats behave like any other, except they are not saved in the model's memory and will automatically delete after 72 hours. This feature is included in every account, regardless of membership level.
Limitations Are Not Detrimental
Currently, Gemini's free account allows for 10 daily uploads; however, the high schoolers only work on 1-2 letters per session, each being about 4-5 pages. Therefore, this limitation isn't detrimental, especially if all the letter pages are included in one PDF.
Still, to combat this, a team member of mine created a quick web app that uses Google Gemini's developer API. Using this Gemini wrapper, the limitation is now based on tokens rather than daily uploads.
Web App Link: https://madisonarchivr-434336472210.us-east1.run.app/
Google Gemini offers free pro memberships to all college students; however, since our clients don't meet that criteria, my team and I decided to compare the responses of a regular account and one with a pro membership. Below are the two responses side-by-side. There are a few differences between the two responses; however, considering the scale of the project and the minor errors, they will not have a significant impact on the project's efficiency.
For the transcription of letters, a super task-focused software is not necessary - especially for high school students with no prior programming knowledge who perform this task once a week for a few hours after school - a general-purpose solution is sufficient.
I found that while AI is powerful in recognizing patterns visually, it still cannot fully replace human insight for interpreting context, emotion, and historical nuance. Additionally, it is essential to acknowledge that its accuracy decreases with messy cursive and historical spelling variations, necessitating manual review. I recommend the initial draft be done by Gemini, followed by human review and correction, if needed, to ensure an accurate transcription.
Regardless of the approach taken, proofreading is required because messy handwriting is difficult to decipher even for humans; therefore, the accuracy isn't necessarily a fault solely stemming from the technology. A blended approach leverages the efficiency that cutting-edge technology enables without compromising accuracy.
Crash Course AI: How to Make an AI read your Handwriting.
How to make an AI read your handwriting (LAB) : Crash Course Ai #5
Google. (2024, April 22). Gemini (2.0) [Large language model]. https://g.co/gemini/
Hang Li. 2022. Language models: past, present, and future. Communications of the. ACM 65, 7 (July 2022), 56–63. https://doi.org/10.1145/3490443
https://cacm.acm.org/research/language-models/
Humphries, Mark. "Has Google Quietly Solved Two of AI’s Oldest Problems?" (2025, October 17). Generative History. https://generativehistory.substack.com/p/has-google-quietly-solved-two-of
Muehlberger, Guenter, Louise Seaward, Melissa Terras, Sofia Ares Oliveira, Vicente Bosch, Maximilian Bryan, Sebastian Colutto et al. "Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study." Journal of documentation 75, no. 5 (2019): 954-976. https://www.emerald.com/jd/article-pdf/75/5/954/1398560/jd-07-2018-0114.pdf
OpenAI. (2023). ChatGPT [Large language model]. https://chat.openai.com/
Transkribus Handwritten Text Recognition Overview. https://www.reddit.com/r/computervision/comments/1ctf0jh/2024_review_of_ocr_tools_extracting_text_from/
Materials Presented on Demo Day
Google Gemini: gemini.google.com/app
Web App Link (MadisonArchivr): https://madisonarchivr-434336472210.us-east1.run.app/
How to use MadisonArchivr:
Upload file you want transcribed, then click "Transcribe Handwriting"
After the loading screen, it'll redirect to a page with the transcribed text in a text box and an image preview of the image that was transcribed
There are two buttons below text box with transcribed text: "Copy" and "Edit
Copy - copies everything in the text box to your clipboard so you can paste it where you want it
Edit - textbox will become interactable, allowing the user to make edits directly in the textbox
Under main component, there is a back arrow labeled "Transcribe more images with MadisonArchivr" that will take you back to the first page to transcribe a new document
Note: if the loading screen takes too long or if it is unable to transcribe the document, refresh the page and try again
Through this project, I gained a deeper understanding of myself and the challenges of working in the CS field, particularly when collaborating with a real client. I realized how important it is to prioritize the client’s needs over the excitement of new or advanced technology. Throughout the process, I often came across impressive tools, but I had to remind myself that the goal wasn’t to find the “best” technology - it was to find the technology that best suited the client’s specific situation. Shifting my perspective to think from the client’s point of view helped guide my decisions and made the entire project more purposeful.
This experience also opened my eyes to a side of computer science I hadn’t fully considered before. I didn’t expect to enjoy the problem-solving involved in researching solutions and building a demo tailored to someone else’s needs, but it turned out to be both valuable and rewarding. Personally and academically, it made me more open to exploring this type of client-focused work in the future.