UNDER MINHEE+A 💗

An interactive DJing platform that enables shared DJ performances

Website: TBU
Code: TBU
Demo: https://youtu.be/ctbv4PoeLkE

UNDER MINHEE+A is an interactive DJing platform that enables shared DJ performances. It visualizes the DJ’s selections for the audience, applies the DJ’s mood and vibe through mood-expressive gesture detection, and allows anyone to play music by selecting mix points.

When listening to a DJ’s set, the audience usually does not know which track will come next or how the music will transition until it happens. But what if the audience did know? This awareness could increase anticipation for the upcoming music. Sharing the DJ’s selections can also enhance audience participation, allowing everyone to experience the performance together.

In addition, the DJ’s movements and vibe are an important affective element of a live set. By using gestures to influence track selection and enhancing visualizations with corresponding emojis, the DJ’s vibe can be more effectively conveyed to the atmosphere.

Also, anyone could be DJ! With preprocessed musical structure analysis, users can select structural boundaries as mix points. Once the mix-out point of the current track and the mix-in point of the next track are chosen, the platform automatically transitions between tracks.

BTW, the name UNDER MINHE+A comes from a combination from Undermania, KAIST Graduate School's DJing club, and my name Minhee!

* This project is strongly influenced by DJ StructFreak [1].

Functional Details

1) Sharing the DJ’s Choices with the Audience

The DJ can select the next track via recommendations or search, and define mix-out and mix-in points for the current and next tracks. These selections are visualized in real time, allowing the audience to observe the DJ’s choices and anticipate upcoming transitions.

2) Gesture-Driven Mood Recommendations and Visual Feedback

When the DJ performs a mood-expressive gesture, the system detects it and recommends tracks whose predicted moods match the expressed gesture. In addition, corresponding emojis appear on the interface to visually convey the DJ’s mood and enhance the performance atmosphere.

Below are predefined gestures and their corresponding mood:

✌️ : happy_fun

❤️ : romantic_emotional

🐾 : dark_intense

🤘 (backwards) : inspirational_cinematic

👊 : energetic

🙏 : calm_dreamy

3) Preprocessed DJ Mix Points via Musical Structure Analysis

Musical structure analysis is performed in advance to identify section boundaries and candidate transition points for each track. DJs can use these precomputed boundaries as mix-out and mix-in points, enabling smooth and musically coherent transitions without manual cue searching.

Technical Details

1) Gesture Detection

The system uses MediaPipe Hands for real-time hand landmark extraction, producing 21 3D keypoints per hand. These landmarks are flattened into a 63-dimensional feature vector and passed to a custom Multi-Layer Perceptron (MLP) classifier implemented with Scikit-learn.

The classifier has two hidden layers (128 and 64 units) with ReLU activation and outputs a discrete gesture label (e.g., heart, peace, punch). The Python backend runs inference on the webcam stream and publishes the current gesture state to the React frontend, which then triggers mood-based track recommendations and corresponding emoji animations.

2) Music Structure Analysis

We use All-In-One Music Structure Analyzer [2] to perform musical structure analysis in advance for each track. The tool predicts core structural cues such as tempo (BPM), beats, downbeats, functional segment boundaries, and functional segment labels (e.g., intro, verse, chorus, bridge, outro).

In our platform, we precompute these results offline and store them as JSON analysis files. During DJing, the UI loads the precomputed boundaries and labels as candidate mix points, so users can quickly select mix-out / mix-in points without manual cue searching.

3) Music Mood tagging

We use Music2Emotion [3] for music emotion recognition. It provides dimensional emotion predictions (valence, arousal) along with predicted mood tags, via a unified MER framework that integrates categorical and dimensional labels.

In our pipeline, we run batch inference offline and save per-track JSON outputs containing valence, arousal, and predicted_moods. The platform then uses these stored predictions to support mood-based recommendations by ranking tracks by the confidence for the target mood.

References

[1] Kim, Taejun & Nam, Juhan (2023). DJ StructFreak: Automatic DJ system built with music structure embeddings. Late-Breaking Demo in the 24th International Society for Music Information Retrieval Conference (ISMIR 2023).

[2] Kim, Taejun & Nam, Juhan (2023). All-in-one metrical and functional structure analysis with neighborhood attentions on demixed audio. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2023).

[3] Kang, Jaeyong & Herremans, Dorien (2025). Towards unified music emotion recognition across dimensional and categorical models. arXiv:2502.03979.

Page updated

Google Sites

Report abuse