Faculty Collaborator: Andrew Creamer
About:
Raymond has been collaborating with Professor John Cayley to develop a measure of visual similarity between CJKV characters to support a digital art project. His work involves determining the similarity between characters based on metrics such as pixel-edit distance, radical structure, and stroke-order edit distance. This will be used to create a database linking common characters to their most visually similar neighbors, intended for use in course material. Additionally, it will be deployed in an art project where random characters in a Mandarin Chinese passage mutate to different characters with visual similarities.
Purpose:
Explore how we read and write, focusing on interpretation rather than the nature of language.
Understand “reading” as an activity that can engage multiple senses like vision, hearing, and touch.
Recognize that the symbols and glyphs we use are arbitrary yet capable of conveying complex meanings.
Investigate this non-traditional application of data science.
CJK Characters:
CJK (Chinese, Japanese, Korean) characters:
Pictographic rather than phonetic.
A single glyph or a small group of 2-3 glyphs typically represents a single word.
Each glyph follows a structural pattern with components that may have phonetic or semantic meanings.
Example:
妈 (Mother) consists of 女 (woman) and 马 (horse).
The Project:
Create an art piece examining how small visual mutations shift semantic meaning.
The data science aspect involves data preprocessing and computer vision tasks.
Challenges:
Technical:
Difficulty in reading the characters.
Limited knowledge of computer vision outside of Neural Networks.
Challenges in finding the right datasets and methods.
Communication:
Maintaining a regular cadence is difficult.
The problem area is not clearly defined.
Philosophical:
Data science is reductionist while art is constructive.
Working with a language tied to personal background but not fully understood.
What Didn’t Work:
ORB (Oriented FAST and Rotated BRIEF)
Contour-based analysis
What Did Work:
Stroke Edit Distance
Pixel Edit Distance
Remaining Tasks:
Implement cosine similarity between model outputs (e.g., CLIP, a pre-trained OCR model, or an ad hoc CNN).
Improve communication of results and gather feedback.
Analyze character similarity to determine which characters are "core" (most characters have low edit distance to them) and which are "rare."