Watch it run cold
The rest of this site serves frozenpipeline output for speed and safety. This page is the other half of the proof: the engine driven live — a warm ~12 second answer, the ~12 minute cold path and why it’s bounded by design, and the cache payoff that makes both coexist. It’s run against the real backend over an SSH tunnel — never linked from this site or exposed to the internet — and captured as the walkthrough below (a recorded screencast remains optional; the interactive replay now carries this proof).
This story is now interactive. The replay console animates recorded runs stage-by-stage from their persisted telemetry — including a real Gate-1-cold capture of the same shape this page narrates: the MusicBrainz grind, honestly time-compressed (that capture resolved 60 uncached candidates in ~3 min; the production run narrated below hit the full ~12 min at the 75-candidate cap). The warm replays are the cache payoff on screen. Or pick any seed from the console.
Screencast — optional companion
The interactive replay console now carries the cold-run story from recorded telemetry; a screencast remains an optional add and would slot in here unchanged. The act-by-act walkthrough below describes the live run in prose.
What the run shows
- Act 1
It's live
~90s- POST /recommend with Take Five — The Dave Brubeck Quartet against a warm corpus returns a 200 in about 12 seconds.
- The top neighbours are real: Alphanumeric — Lee Konitz, Red Pepper Blues — Art Pepper, Three to Get Ready — Dave Brubeck.
- Note that the seed's own studio master never appears: a near-duplicate of the seed (audio ≥ 0.98 and a title-token match) is suppressed, so the engine never recommends the song back to itself — while a live or acoustic take, which scores lower, survives.
- Act 2
The cold cliff is real — and it's a design feature
~3 min- POST a never-seen seed and the API returns 202 JobAccepted with a queued job handle and a status URL, rather than blocking the request. That's Gate 1: at 5 or more uncached candidate lookups (the ones that hit MusicBrainz at ~1 req/s), resolution is deferred to the async worker up front.
- The whole cold request took about 701 seconds (~12 minutes) end to end in production — most of it the ARQ worker grinding MusicBrainz at roughly one request a second, about 7 seconds per candidate. That's bounded on purpose: RESOLVE_CANDIDATE_LIMIT=75 caps the resolve at ~75×7s ≈ 9 minutes (embedding, scoring, and the rationale make up the rest), and WORKER_MAX_JOBS=1 because cold work is MusicBrainz-bound — concurrency would buy no throughput, only multiply latency.
- Polling the status URL returns 202 while it runs, then flips to 200 with the full recommendation response, degradation block and all. The job handle is a plain sequential id — non-enumerable tokens and auth are a named, deferred item, not a gap being hidden (it's why there's no public live endpoint).
- Re-run the exact same seed and it returns a warm 200 in ~12 seconds, now with a high embeddings-cache-hit count. That's the lazy-corpus payoff on camera: the first run grew the pgvector cache, so the second skips the embedding work entirely.
- Act 3
Why it's built this way
~3 min- The pivots: an LLM can't judge audio it never heard (and Spotify closed those endpoints to new apps in 2024), and a pre-embedded royalty-free corpus answers a chart hit with unknown tracks. The hybrid retrieve-then-rerank design is what survived — CLAP owns ranking, the LLM only explains.
- Open the eval reports and walk the ablation: at the resolve cap of 75, the CLAP-reranked top-10 shares a median of just 0.2 of its order with the pure cultural ranking, moving tracks a median of 3.4 places. The audio leg is doing real work, not passing the cultural order through.
- Open the Postgres query_logs / query_log_results row backing that poll — the durable, dual-persisted telemetry that the static showcase JSON is a serialization of. Same numbers, live source of truth.
- Honest closer: Doppel won't beat Spotify for casual 'play me something similar.' The wedge is deliberate discovery, and the deferred hardening (auth, rate limiting, opaque handles, connection-scoping) is scoped engineering judgment, named not hidden.
Want the reasoning rather than the runtime? How it works covers the design, the pipeline, and the eval evidence.