Edge-AI for Audio & Music

PulseFlow AI - Offline Music Recommender & Recogniser

Local Music Recommendation System

Offline, Smooth, and Customizable

Local song ID and AI playlists. No cloud. CLAP embeddings + FAISS search.

I built PulseFlow AI to recognise tracks from my own library and generate playlists that either stay smooth or deliberately jump in style without wrecking the mood. Everything runs offline on a regular laptop and avoids painful Windows dependencies.

What it does

Recognises songs locally from a short clip or mic input using audio fingerprints.
Understands style with CLAP embeddings for “sounds-like” retrieval
Respects musicality by considering mood, tempo, key, and energy.
Builds playlists in two modes: smooth continuity or planned contrast.
Explains choices with short LLM blurbs via Ollama (optional, still offline).

Why I made it

Streaming “Flow” features can swing genres and moods randomly or get stuck. I wanted full control: my files, my tags, predictable transitions, and instant results without the cloud.

How it works (pipeline)

Fingerprint local tracks to identify a playing clip.
Extract features with librosa: tempo, key, energy.
Embed each track with CLAP (HTSAT-tiny) → 512-D vectors.
Index embeddings with FAISS for fast nearest-neighbor search.
Sequence tracks with a cost function that balances style distance, mood, tempo, key, energy.
Explain transitions with a small local LLM via Ollama (optional).

Highlights

100% offline: no external APIs or accounts.
Windows-friendly: avoids Essentia on Windows, uses pure-Python fallbacks.
One-command ingest: incremental, only processes new files.
Unified CLI – python main.py ingest then python main.py playlist...
Autoinstall guardrails – missing deps are installed at runtime when possible.

What I solved

Dependency hell on Windows: replaced hard builds with pure-Python paths, optionalised fragile libs, added runtime installers and clear fallbacks.
Repeat work: incremental ingest stores features and embeddings to avoid reprocessing.
Chaotic transitions: sequencing engine that preserves mood while allowing planned style jumps.

What’s next

Lightweight web UI for drag-and-drop clips and playlist export.
GPU-accelerated FAISS on RTX for huge libraries.

Recommender CLI output

Creation of the playlist

PulseFlow AI — Next Steps Shipped

Here’s the quick portfolio update you can drop in:

What’s new

Lightweight web UI (FastAPI + Uvicorn)
Drag-and-drop a clip or paste a path, pick length/policy, click Generate. Ingest can be triggered from the UI, and results are saved to JSON plus .m3u8 for quick export.
LLM-forward playlist output (Ollama)
Each transition now has a concise, human-readable blurb, and the playlist includes a Top pick + one-line Overview of the vibe/policy. Works fully offline with your local model.
FAISS GPU (auto-fallback)
If a compatible RTX GPU is present, FAISS-GPU is used for faster search; otherwise we transparently fall back to CPU.

Hardening & DX

Robust CLAP loading on modern stacks (NumPy 2.x + PyTorch 2.6 safe-load allowlisting).
Cleaner LLM text formatting in the UI and JSON payloads.
Auto-ingest path: if index/mapping are missing, the pipeline runs features → embeddings → index.

Why it matters

Faster: GPU-backed ANN on large libraries.
Friendlier: one-click web UI, no terminal required.
Explainable: playlists come with reasons you can skim or share.

Repo: Philippe-Guerrier/pulseflow-ai-offline-music-recommender
Tech: FastAPI, FAISS/FAISS-GPU, PyTorch, CLAP (HTSAT-tiny), Ollama, NumPy 2.x

Enregistrement de l'écran 2025-08-11 230146.mp4

Edge-AI Music Recommender

Local Shazam-style ID + Playlist Engine with Qwen 3B

"Beats & Bots"

Edge-AI Music Intelligence: “Local Music DSP + LLMs”

Local AI Music Identification & Recommendation System

Summary
I designed and implemented a fully offline music identification and recommendation engine that runs entirely on local hardware without relying on cloud APIs. The goal was to reproduce and extend “Shazam-like” song recognition while solving key issues I’ve observed in mainstream services such as Deezer Flow, namely abrupt mood/style shifts, lack of pattern continuity, and limited personalization logic.

Key Features & Pipeline

Audio Fingerprinting:
Used a local implementation (PyDejavu) to fingerprint my library into a SQLite database, enabling sub-second recognition from an 8-second microphone or file snippet.
High-Level Audio Features:
Extracted mood, energy, tempo, key, and timbral descriptors with Essentia(But I couldn't make this option work... So...); tagged genre, instruments, and mood labels with MusicNN.
Audio Embeddings for Similarity:
Computed 512-dimensional CLAP (HTSAT-Tiny) embeddings for “sounds-like” retrieval across genres. Stored in a FAISS index for instant nearest-neighbor search.
Sequencing Engine (Custom anti-Flow logic):
Implemented weighted cost functions to control transitions:
- Smooth mode: maintain style & mood continuity.
- Contrast mode: deliberately switch styles while avoiding mood/energy whiplash.
Local LLM Explanations:
Integrated Qwen 2.5 VL 3B (via Ollama) to produce short, natural-language justifications for each playlist transition—entirely offline.
Resource Efficiency:
Designed for controlled VRAM usage (≤ 8 GB) and predictable latency (< 0.5 s end-to-end).

Workflow

Ingest: Fingerprint, extract features, compute embeddings, build FAISS index (one-off).
Runtime:
- Recognize currently playing track from snippet.
- Retrieve candidates (smooth or contrast policy).
- Sequence playlist using cost-based constraints.
- Call local LLM for explanations.
Output: 20-track playlist with mood-aware sequencing and concise reasoning.

Impact & Uniqueness

Fully autonomous, no internet required; ideal for privacy-sensitive or offline contexts.
Balances musical surprise with emotional continuity, solving a common problem in auto-generated playlists.
Modular design: fingerprinting, embedding, sequencing, and explanation components are independent and reusable.

Tech Stack

Languages: Python
Audio Processing: PyDejavu, Essentia, MusicNN, CLAP
Vector Search: FAISS
LLM Integration: Ollama (Qwen 2.5 VL 3B, DeepSeek R1 8B)
Data Formats: SQLite, NumPy, JSON, YAML
Hardware Optimization: CUDA-enabled PyTorch, quantized LLMs, float16 embeddings

Page updated

Google Sites

Report abuse