Shared Task: Conversational Music Recommendation Challenge (Music-CRS)
The goal of this challenge is to build conversational recommendation systems that can understand user music preferences through natural dialogue and recommend relevant tracks from a music catalog.
Build a conversational recommendation that can:
- Understand user music preferences from previous conversation and user ID
- Recommend relevant tracks from a music catalog
- Generate natural language responses
- Task description release: October 15, 2025
- Development dataset release: October 15, 2025
- Baseline system release: October 15, 2025
- Submission site opens: December 1, 2025
- Blind evaluation dataset release: December 1, 2025
- Final submission deadline: December 19, 2025
- Results notification: January 23, 2026
Conversational music recommendation is an emerging task in music information retrieval that combines natural language understanding with personalized recommendation. We invite participants to develop systems that can engage in multi-turn conversations about music and provide accurate track recommendations.
This challenge involves predicting relevant music tracks for each turn in a conversation, given the dialogue history and user context. Participants are expected to build systems that take conversational context as input and output ranked lists of track recommendations along with natural language responses.
Participants must submit the following materials:
1. Technical Report (2 pages maximum): A LaTeX template is available here (no Word templates is provided): https://github.com/mulab-mir/nlp4MusA-style-files
2. Test Evaluation Predictions (`test_inference.json`)
3. Blind Evaluation Predictions (`blind_inference.json`)
⚠️ IMPORTANT: Participants must strictly follow this JSON format for their predictions.
Your inference results must be saved as a JSON file in `exp/inference/<your_method_name>.json` with the following structure:
[
{
"session_id": "69137__2020-02-08",
"user_id": "69137",
"turn_number": 1,
"predicted_track_ids": [
"60a0Rd6pjrkxjPbaKzXjfq",
"2nLtzopw4rPReszdYBJU6h",
"5UWwZ5lm5PKu6eKsHAGxOk",
...
],
"predicted_response": "Based on your preferences, I recommend..."
},
...
]
The challenge uses the TalkPlayData-2 dataset, a large-scale conversational music recommendation dataset with multi-turn dialogues and music listening history.
Conversation Dataset: TalkPlayData-2
Track Metadata: TalkPlayData-2-Track-Metadata - Contains track information (track_id, track name, artist, album, tags, release date)
User Profiles: TalkPlayData-2-User-Metadata - Contains user information (user_id, age, gender, country)
Pre-extracted Track Embeddings: TalkPlayData-2-Track-Embeddings
Pre-extracted User Embeddings: TalkPlayData-2-User-Embeddings
for more baselines, please refer to: https://github.com/nlp4musa/music-crs-baselines
The system operates on a two-stage pipeline:
Recsys: Find candidate tracks matching user preferences
LLM: Create natural language responses explaining recommendations
We will use Normalized Discounted Cumulative Gain (nDCG) at k={1, 10, 20} as the primary evaluation metrics.
We provide several baseline systems for reference:
Random Baseline: Recommends 20 randomly sampled tracks from the catalog.
Popularity Baseline: Recommends the 20 most popular tracks from the training set.
BM25 + Llama-1B: Uses BM25 sparse retrieval for candidate generation and Llama-3.2-1B-Instruct for response generation. This is the strongest baseline.
BERT + Llama-1B: Uses BERT-based dense retrieval for candidate generation and Llama-3.2-1B-Instruct for response generation.
All baseline code is available in the music-crs-baselines repository.
Before submitting your predictions, ensure:
JSON file is saved in exp/inference/<method_name>.json
All required fields are present (session_id, user_id, turn_number, predicted_track_ids, predicted_response)
Predictions cover all sessions and turns (1-8) in the test set
JSON is properly formatted (use json.dump() with ensure_ascii=False)
Baseline Systems: https://github.com/nlp4musa/music-crs-baselines
Evaluation Framework: https://github.com/nlp4musa/music-crs-evaluator
For questions or issues with the challenge, please open an issue in the respective GitHub repositories.
Good luck with the challenge! 🎵