Watch it run cold

The rest of this site serves frozenpipeline output for speed and safety. This page is the other half of the proof: the engine driven live — a warm ~12 second answer, the ~12 minute cold path and why it’s bounded by design, and the cache payoff that makes both coexist. It’s run against the real backend over an SSH tunnel — never linked from this site or exposed to the internet — and captured as the walkthrough below (a recorded screencast remains optional; the interactive replay now carries this proof).

This story is now interactive. The replay console animates recorded runs stage-by-stage from their persisted telemetry — including a real Gate-1-cold capture of the same shape this page narrates: the MusicBrainz grind, honestly time-compressed (that capture resolved 60 uncached candidates in ~3 min; the production run narrated below hit the full ~12 min at the 75-candidate cap). The warm replays are the cache payoff on screen. Or pick any seed from the console.

Screencast — optional companion

The interactive replay console now carries the cold-run story from recorded telemetry; a screencast remains an optional add and would slot in here unchanged. The act-by-act walkthrough below describes the live run in prose.

What the run shows

Act 1
It's live
~90s
- POST /recommend with Take Five — The Dave Brubeck Quartet against a warm corpus returns a 200 in about 12 seconds.
- The top neighbours are real: Alphanumeric — Lee Konitz, Red Pepper Blues — Art Pepper, Three to Get Ready — Dave Brubeck.
- Note that the seed's own studio master never appears: a near-duplicate of the seed (audio ≥ 0.98 and a title-token match) is suppressed, so the engine never recommends the song back to itself — while a live or acoustic take, which scores lower, survives.
Act 2
The cold cliff is real — and it's a design feature
~3 min
- POST a never-seen seed and the API returns 202 JobAccepted with a queued job handle and a status URL, rather than blocking the request. That's Gate 1: at 5 or more uncached candidate lookups (the ones that hit MusicBrainz at ~1 req/s), resolution is deferred to the async worker up front.
- The whole cold request took about 701 seconds (~12 minutes) end to end in production — most of it the ARQ worker grinding MusicBrainz at roughly one request a second, about 7 seconds per candidate. That's bounded on purpose: RESOLVE_CANDIDATE_LIMIT=75 caps the resolve at ~75×7s ≈ 9 minutes (embedding, scoring, and the rationale make up the rest), and WORKER_MAX_JOBS=1 because cold work is MusicBrainz-bound — concurrency would buy no throughput, only multiply latency.
- Polling the status URL returns 202 while it runs, then flips to 200 with the full recommendation response, degradation block and all. The job handle is a plain sequential id — non-enumerable tokens and auth are a named, deferred item, not a gap being hidden (it's why there's no public live endpoint).
- Re-run the exact same seed and it returns a warm 200 in ~12 seconds, now with a high embeddings-cache-hit count. That's the lazy-corpus payoff on camera: the first run grew the pgvector cache, so the second skips the embedding work entirely.
Act 3
Why it's built this way
~3 min
- The pivots: an LLM can't judge audio it never heard (and Spotify closed those endpoints to new apps in 2024), and a pre-embedded royalty-free corpus answers a chart hit with unknown tracks. The hybrid retrieve-then-rerank design is what survived — CLAP owns ranking, the LLM only explains.
- Open the eval reports and walk the ablation: at the resolve cap of 75, the CLAP-reranked top-10 shares a median of just 0.2 of its order with the pure cultural ranking, moving tracks a median of 3.4 places. The audio leg is doing real work, not passing the cultural order through.
- Open the Postgres query_logs / query_log_results row backing that poll — the durable, dual-persisted telemetry that the static showcase JSON is a serialization of. Same numbers, live source of truth.
- Honest closer: Doppel won't beat Spotify for casual 'play me something similar.' The wedge is deliberate discovery, and the deferred hardening (auth, rate limiting, opaque handles, connection-scoping) is scoped engineering judgment, named not hidden.

Want the reasoning rather than the runtime? How it works covers the design, the pipeline, and the eval evidence.

What the run shows

It's live

The cold cliff is real — and it's a design feature

Why it's built this way