Local Music Recommendation System
Offline, Smooth, and Customizable
I built PulseFlow AI to recognise tracks from my own library and generate playlists that either stay smooth or deliberately jump in style without wrecking the mood. Everything runs offline on a regular laptop and avoids painful Windows dependencies.
Recognises songs locally from a short clip or mic input using audio fingerprints.
Understands style with CLAP embeddings for “sounds-like” retrieval
Respects musicality by considering mood, tempo, key, and energy.
Builds playlists in two modes: smooth continuity or planned contrast.
Explains choices with short LLM blurbs via Ollama (optional, still offline).
Streaming “Flow” features can swing genres and moods randomly or get stuck. I wanted full control: my files, my tags, predictable transitions, and instant results without the cloud.
Fingerprint local tracks to identify a playing clip.
Extract features with librosa: tempo, key, energy.
Embed each track with CLAP (HTSAT-tiny) → 512-D vectors.
Index embeddings with FAISS for fast nearest-neighbor search.
Sequence tracks with a cost function that balances style distance, mood, tempo, key, energy.
Explain transitions with a small local LLM via Ollama (optional).
100% offline: no external APIs or accounts.
Windows-friendly: avoids Essentia on Windows, uses pure-Python fallbacks.
One-command ingest: incremental, only processes new files.
Unified CLI – python main.py ingest then python main.py playlist...
Autoinstall guardrails – missing deps are installed at runtime when possible.
Dependency hell on Windows: replaced hard builds with pure-Python paths, optionalised fragile libs, added runtime installers and clear fallbacks.
Repeat work: incremental ingest stores features and embeddings to avoid reprocessing.
Chaotic transitions: sequencing engine that preserves mood while allowing planned style jumps.
Lightweight web UI for drag-and-drop clips and playlist export.
GPU-accelerated FAISS on RTX for huge libraries.
Creation of the playlist
Lightweight web UI (FastAPI + Uvicorn)
Drag-and-drop a clip or paste a path, pick length/policy, click Generate. Ingest can be triggered from the UI, and results are saved to JSON plus .m3u8 for quick export.
LLM-forward playlist output (Ollama)
Each transition now has a concise, human-readable blurb, and the playlist includes a Top pick + one-line Overview of the vibe/policy. Works fully offline with your local model.
FAISS GPU (auto-fallback)
If a compatible RTX GPU is present, FAISS-GPU is used for faster search; otherwise we transparently fall back to CPU.
Robust CLAP loading on modern stacks (NumPy 2.x + PyTorch 2.6 safe-load allowlisting).
Cleaner LLM text formatting in the UI and JSON payloads.
Auto-ingest path: if index/mapping are missing, the pipeline runs features → embeddings → index.
Faster: GPU-backed ANN on large libraries.
Friendlier: one-click web UI, no terminal required.
Explainable: playlists come with reasons you can skim or share.
Repo: Philippe-Guerrier/pulseflow-ai-offline-music-recommender
Tech: FastAPI, FAISS/FAISS-GPU, PyTorch, CLAP (HTSAT-tiny), Ollama, NumPy 2.x
Repository: Link
Local Shazam-style ID + Playlist Engine with Qwen 3B
"Beats & Bots"
Edge-AI Music Intelligence: “Local Music DSP + LLMs”
Repository: Link
Summary
I designed and implemented a fully offline music identification and recommendation engine that runs entirely on local hardware without relying on cloud APIs. The goal was to reproduce and extend “Shazam-like” song recognition while solving key issues I’ve observed in mainstream services such as Deezer Flow, namely abrupt mood/style shifts, lack of pattern continuity, and limited personalization logic.
Key Features & Pipeline
Audio Fingerprinting:
Used a local implementation (PyDejavu) to fingerprint my library into a SQLite database, enabling sub-second recognition from an 8-second microphone or file snippet.
High-Level Audio Features:
Extracted mood, energy, tempo, key, and timbral descriptors with Essentia(But I couldn't make this option work... So...); tagged genre, instruments, and mood labels with MusicNN.
Audio Embeddings for Similarity:
Computed 512-dimensional CLAP (HTSAT-Tiny) embeddings for “sounds-like” retrieval across genres. Stored in a FAISS index for instant nearest-neighbor search.
Sequencing Engine (Custom anti-Flow logic):
Implemented weighted cost functions to control transitions:
Smooth mode: maintain style & mood continuity.
Contrast mode: deliberately switch styles while avoiding mood/energy whiplash.
Local LLM Explanations:
Integrated Qwen 2.5 VL 3B (via Ollama) to produce short, natural-language justifications for each playlist transition—entirely offline.
Resource Efficiency:
Designed for controlled VRAM usage (≤ 8 GB) and predictable latency (< 0.5 s end-to-end).
Workflow
Ingest: Fingerprint, extract features, compute embeddings, build FAISS index (one-off).
Runtime:
Recognize currently playing track from snippet.
Retrieve candidates (smooth or contrast policy).
Sequence playlist using cost-based constraints.
Call local LLM for explanations.
Output: 20-track playlist with mood-aware sequencing and concise reasoning.
Impact & Uniqueness
Fully autonomous, no internet required; ideal for privacy-sensitive or offline contexts.
Balances musical surprise with emotional continuity, solving a common problem in auto-generated playlists.
Modular design: fingerprinting, embedding, sequencing, and explanation components are independent and reusable.
Tech Stack
Languages: Python
Audio Processing: PyDejavu, Essentia, MusicNN, CLAP
Vector Search: FAISS
LLM Integration: Ollama (Qwen 2.5 VL 3B, DeepSeek R1 8B)
Data Formats: SQLite, NumPy, JSON, YAML
Hardware Optimization: CUDA-enabled PyTorch, quantized LLMs, float16 embeddings