HOW IT WORKS
Before any LLM gets involved, every playlist in my catalog is pre-tagged using Spotify’s analysis tools plus Chosic. Mood, BPM, energy — all baked in per track. The LLM then matches your query against that.
Four stages — and a feedback loop that makes the picks keep getting better.
Spotify analytics + Chosic per track. Mood, BPM, energy — applied per playlist.
Your query goes in. The model picks from my catalog using mood + audio targets.
One playlist out, with a short why-this-fits.
Every match + feedback logged. Feeds the next round so the picks keep getting sharper.
↑ feedback loops back into scoring
Each signal contributes up to a cap so nothing dominates. The final rank is the sum.
+5 per matched tag. The semantic core.
4-vector: energy, dance, happiness, acoustic. Closer to target → higher score.
If you name an artist, that pulls hard.
Exact match only — cafe, gym, club, drive.
Weighted subgenres first, main genre as fallback.
70s, 90s, modern — % of playlist in that era.
Soft falloff: full marks in range, partial within ±20.
Always-on baseline. Better-rated playlists drift up.
Bias for niche when you ask for niche.
38 columns, grouped into clusters the scorer can lean on.
The engine takes the top 3 scored playlists and picks one weighted by score — higher score, more likely, but never the same playlist every time you ask the same thing.
Stale top-1 picks kill the magic by day three. Variety in the top-3 keeps it interesting without losing the fit.
The MVP runs on Groq’s free tier (llama-3.1-8b-instant). Not the strongest model — but it’s free, fast, and good enough for the catalog size.
Plan is to move to a sharper paid model once the test group kicks in and feedback starts shaping the scoring.
The lite version (on the music page) is LLM only — query in, slug out. Fast and conversational.
The future full app will use the deterministic scorer for explicit filters, BPM ranges, and a real “why this match?” breakdown the LLM can’t reliably produce.