Free with subscription · 25 entries

Subscribe to First Token.

One production AI deep dive every Tuesday — for backend engineers who ship the systems behind the demo. Confirm and we'll send the RAG eval cheatsheet right after.

No spam. Unsubscribe in one click.

One more step — check your inbox.

We just sent a confirmation link. Click it to lock in your subscription and we'll send the cheatsheet right after.

Written by Tekeshwar — Senior Software Engineer at xFarm. Previously: shipping backend & AI systems at Breakthrough Apps. Read more →

Sample entry · 1 of 25

#09 / Retrieval High severity

Pure vector search, no hybrid.

Surfaces in: error codes · ticket IDs · SKUs · code symbols

Symptom

Vector search encodes meaning, not strings. "ENG-4823 deploy failure" gets encoded as "looks like a ticket ID about deploys" — returns semantically similar bugs, misses the exact ticket. Same failure hits every rare keyword: error codes, product names, version numbers.

Test

# 30 queries with exact-match content
# (IDs, error codes, SKUs).
# compare vector-only vs BM25 + vector (RRF).
# hybrid usually wins +5–15 pts recall@10
# on corpora with proper nouns.

Snippet

def rrf(rank_lists, k=60):
    scores = 
    for ranks in rank_lists:
        for rank, doc_id in enumerate(ranks):
            scores[doc_id] = scores.get(doc_id, 0) \
                + 1 / (k + rank + 1)
    return sorted(scores.items(),
                  key=lambda x: -x[1])

vec  = vector_search(query, k=50)
bm25 = bm25_search(query, k=50)
fused = rrf([[h.id for h in vec],
             [h.id for h in bm25]])[:20]

Fix

Hybrid by default. RRF is the simplest fusion — no tunable weights, robust to score-scale mismatches. Tradeoff: two indexes to maintain. Postgres pgvector + tsvector avoids the operational doubling in one store.

+ 24 more entries PDF · 25 pages

What you actually get

01 / Production-grade

Tactics from real shipping, not blog posts about blog posts.

Every issue starts with a real failure mode, debug story, or architectural decision — drawn from systems serving actual users. If it works in a demo, it doesn't ship here.

02 / Backend-first

Written for engineers who ship the systems behind the demo.

APIs, queues, retrieval, caching, observability, evals. The boring infrastructure that makes LLM products actually work in production — not prompt engineering hot takes.

03 / One per week

A single deep dive every Tuesday. No filler, no daily noise.

Long enough to be useful, short enough to read on the commute. Roughly 1,800 words, one diagram, one runnable snippet per issue. That's the contract.