“Image doesn’t describe itself.” Magritte’s painting reminds us that an image does not represent the thing itself but rather functions as a sign. Its meaning is not fixed, but it is shaped and stabilized by language and by the labels we attach to it. Once someone looks at an image and declares “this is xxx” every subsequent viewer tends to inherit that interpretation, often uncritically.
In machine learning, this dynamic becomes even more pronounced. The power to decide what an image “is” lies with the dataset creators. They determine which categories exist and which are excluded, which features are deemed meaningful and which are ignored. By the time the machine “learns,” it is no longer engaging with the raw phenomenon, but with a pre-labeled version of it, which is a collection of images that has already been filtered and classified through human decisions.
Interpretation thus becomes a form of power. Those who train machines hold the authority to define what an image means, while everyone else only sees the model’s outputs and is told: this is what the image is. At the same time, machine learning requires massive amounts of data, much of it scraped or collected without the consent of the people depicted. Individuals often have no opportunity to influence how their images are categorized or what those labels imply about them.
As a result, when datasets contain racialized, gendered, or otherwise biased labels, these assumptions are absorbed into the model and amplified through automation at scale. Non-binary individuals, racial minorities, or marginalized groups are more likely to be misclassified, excluded, or stereotyped. The result is not just technical error but the reproduction and reinforcement of social prejudice through computational systems.
For this reason, asking who gets to label images and how is not a trivial technical concern but a deeply political question. It demands reflection on the power structures embedded in AI systems and on the social consequences that follow from the act of labeling itself.
In this assignment, I created an interactive Rock–Paper–Scissors game based on ml5.js image classification. First, I trained a custom model in Teachable Machine that recognizes three hand gestures: rock, paper, and scissors. I then imported this model into p5.js and modified the image classifier example to integrate it into a playable game. To make the interaction more engaging, I added a computer opponent that randomly selects one of the three gestures. Then I set a three-second countdown timer implemented using millis(), which gives the player time to show their hand gesture before the computer’s choice is revealed. The prototype works successfully, but one limitation I observed is that the model struggles with accurately detecting the "scissors" gesture. At most time the model tends to recognize the gesture as "rock" and "paper". This likely comes from insufficient data or the model itself. In the future, I would improve the dataset by collecting more examples under different lighting conditions and from multiple angles, which should increase accuracy and make the gameplay smoother.