Visual Data Recognition

What is it?

Machine vision technologies (MVT) are systems that enable machines to “see", create, interpret, and manipulate data from images (Sezen, 2020). To say that humans are the only being capable of vision in today’s modern technological society is foolish, as there are varying degrees of sentient thought produced everyday by machines (Sezen, 2020). Social media has encouraged humans to share content of themselves freely, creating constant metadata embedded in images to stream into tech companies who seek to absorb that metadata and teach their machine learning systems how to “see” with it (Sezen, 2020). The breakthrough of machine vision in recent years is the product of deep learning methods such as convolutional neural networks (Mühling et. al., 2018). Digital video libraries and archives are utilizing computer vision to enhance searching, browsing, and accessing media.

Use Case: The German National Library of Science and Technology (DRA) uses computer vision and machine learning algorithms to assist users while searching, accessing, browsing, and archiving digital video content from topics such as mathematics, architecture, computer science, physics, chemistry, and technology/engineering (Mühling et. al., 2018). International scientists from the United States, the United Kingdom, Japan, Sweden, The Netherlands, and more also use the DRA library for research (Mühling et. al., 2018). Achieving machine vision means that the system is able to interpret image data alone and decide what the image consists of, tagging it with appropriate metadata (Malevé, 2020). Automatically tagging images with metadata using an algorithm can be very beneficial to librarians who can then redirect their time to other projects but poses an ethical dilemma as well since algorithms are not human, and do not understand the image’s context (Malevé, 2020).

Software Example: One example of a system that is used for machine vision is the Distant Viewing Toolkit (Arnold, & Tilton, 2020). The Distant Viewing Toolkit is a Python package that analyses images in many different ways (Arnold, & Tilton, 2020). The DVT is an open-source program that is made possible by the Digital Humanities Advancement Grant from the National Endowment for the Humanities (Arnold, & Tilton, 2020). The DVT processes tasks such as: creating an expanding framework for the application of algorithms to moving images, mixing common sound and visual algorithms to display media in innovative ways to the user, and allowing still images and moving images to exist side-by-side when added to the system (Arnold, & Tilton, 2020).

Use Case: National Library of Scotland

Above: Dr. Giles Bergel explains his research on chapbooks using machine vision AI techniques.

Technology Tool: Zetuschel OS 16000 Book Scanner

Above: Zetuschel OS 16000 Book Scanner used at the German National Library for digitization of archival materials using machine vision AI techniques.

References

Arnold, T., & Tilton, L. (2020). Distant viewing toolkit: A python package for the analysis of visual culture. Journal of Open Source Software, 5(45), 1800. doi:10.21105/joss.01800

Malevé. (2020). On the data set’s ruins. AI & Society, 36(4), 1117–1131. https://doi.org/10.1007/s00146-020-01093-w

Mühling, Meister, M., Korfhage, N., Wehling, J., Hörth, A., Ewerth, R., & Freisleben, B. (2018). Content-based video retrieval in historical collections of the German Broadcasting Archive. International Journal on Digital Libraries, 20(2), 167–183. https://doi.org/10.1007/s00799-018-0236-z

Sezen, D. (2020). Without a blink: Machine ways of seeing in contemporary visual culture. Interactions (Bristol, England), 11(1), 103-107. doi:10.1386/iscc_00010_7

Page updated

Report abuse