READING SUMMARY: A neural network is a computational model that learns patterns from data by adjusting internal connections between simple processing units called neurons. Each neuron receives input, applies a mathematical transformation through an activation function, and passes the result forward to the next layer. When data moves through these layers, the network transforms raw inputs such as images, sound, or sensor values into meaningful outputs like classifications or predictions. During training, the network compares its predictions with the correct answers, measures the error, and uses that error to fine-tune its internal weights. Over time, it becomes better at recognizing relationships that are too complex for explicit programming, allowing it to “learn” from experience rather than being directly told what to do.
PROJECT:
Concept
GestureFlow is an interactive audiovisual experience that transforms human hand gestures into dynamic visual and sonic expressions. I used real-time hand tracking and machine learning, letting it recognize different hand shapes and translate them into flowing particle behaviors and reactive sounds. The project explores how natural gestures can become a form of intuitive, embodied interaction that blend motion, light, and sound into one fluid system.
Process
Hand Tracking
The project uses ml5.js HandPose to detect 21 hand keypoints from the webcam in real time. This provides the spatial data needed to describe the user’s hand shape and movement.
Gesture Classification
I trained a simple ml5 Neural Network classifier using hand coordinate data. Each gesture (OPEN, FIST, PEACE, THUMBS) was recorded multiple times to build a small training dataset. After normalization and 30 training epochs, the model could recognize gestures with reasonable accuracy.
Particle System
A custom Particle class manages position, velocity, color, and motion patterns for hundreds of particles.
OPEN: particles expand and move freely across the canvas.
FIST: particles are drawn toward the center of the hand.
PEACE: particles orbit around the hand, forming circular motions.
THUMBS: particles burst outward like an explosion or energy pulse.
Sound Layer
Using p5.sound, I added sound reactions for each gesture to enhance immersion:
OPEN → soft ambient tone
FIST → deep rumble
PEACE → gentle melodic chime
THUMBS → short percussive burst
Technical Principles
Computer Vision: HandPose detects 21 hand landmarks per frame.
Feature Extraction: The coordinates (x, y) of each landmark are converted into a 42-value input vector.
Machine Learning: The neural network classifies the input vector into gesture labels.
Generative Visuals: Based on the predicted label, particles behave according to different rules of motion.
Sound Interaction: Each gesture triggers a specific tone or audio event using p5.sound.
What I Learned
How to use ml5.js for real-time hand tracking and gesture classification.
The workflow of training, normalizing, and classifying data using neural networks in the browser.
How to link visual and auditory feedback to create a unified interactive experience.
Challenges
Understanding neuron network and getting consistent hand detection in changing lighting conditions.
Achieving clear differences between gestures, especially between OPEN and THUMBS.
Future Improvements
Train with more gesture types and more samples per label to improve accuracy.
Add dynamic audio synthesis, where sound frequency or texture responds to motion speed or distance.
Expand into a multimodal artwork, where multiple users can interact simultaneously.