HOW IT WORKS

How the engine thinks

Before any LLM gets involved, every playlist in my catalog is pre-tagged using Spotify’s analysis tools plus Chosic. Mood, BPM, energy — all baked in per track. The LLM then matches your query against that.

The pipeline

Four stages — and a feedback loop that makes the picks keep getting better.

Tagging

Spotify analytics + Chosic per track. Mood, BPM, energy — applied per playlist.

LLM scoring

Your query goes in. The model picks from my catalog using mood + audio targets.

Top match

One playlist out, with a short why-this-fits.

Supabase loop

Every match + feedback logged. Feeds the next round so the picks keep getting sharper.

↑ feedback loops back into scoring

How a playlist gets scored

Each signal contributes up to a cap so nothing dominates. The final rank is the sum.

mood/14

+5 per matched tag. The semantic core.

audio/12

4-vector: energy, dance, happiness, acoustic. Closer to target → higher score.

artists/10

If you name an artist, that pulls hard.

setting/8

Exact match only — cafe, gym, club, drive.

genre/8

Weighted subgenres first, main genre as fallback.

decade/8

70s, 90s, modern — % of playlist in that era.

bpm/6

Soft falloff: full marks in range, partial within ±20.

quality/6

Always-on baseline. Better-rated playlists drift up.

underground/5

Bias for niche when you ask for niche.

What I store per playlist

38 columns, grouped into clusters the scorer can lean on.

Identity

slug + URLs + embed

Audio features

the 4-vector + context flags

Tempo & length

BPM matching + diversity

Genre breakdown

weighted subgenres → main genres

Top artists

name resolution + repeat-count signal

Era

decade % per playlist

Quality / popularity

rating + plays + underground bias

Why the top result varies

The engine takes the top 3 scored playlists and picks one weighted by score — higher score, more likely, but never the same playlist every time you ask the same thing.

Stale top-1 picks kill the magic by day three. Variety in the top-3 keeps it interesting without losing the fit.

The LLM (for now)

Runs on Groq’s free tier using llama-3.3-70b-versatile. Fast inference, generous free quota, and sharp enough for a 21-playlist catalog.

Model choice will evolve as usage patterns and feedback shape what the algorithm needs to get right.

LLM vs deterministic scoring

The lite version (on the music page) is LLM only — query in, slug out. Fast and conversational.

The future full app will use the deterministic scorer for explicit filters, BPM ranges, and a real “why this match?” breakdown the LLM can’t reliably produce.