Model-Adjacent Products, Part 5: The Implementation Path

From Blueprint to Build

You’ve learned:

The physics (Part 1) — latency and token economics
The memory (Part 2) — context and tools
The proof (Part 3) — verification and observability
The guardrails (Part 4) — governance and practice

Now: the build order. What to ship in what sequence.

The wrong order wastes effort. You can’t tune latency without observability. You can’t verify outputs without golden sets. You can’t increase autonomy without governance.

This roadmap sequences infrastructure investments for maximum leverage.

90-Day Path to Production

Days 1-15: Foundation

Goal: See what’s happening. Establish baselines.

Deploy observability stack
- Traces for all LLM calls (Part 3: Observability)
- Capture: prompt, response, latency, tokens, cost
- You can’t improve what you can’t measure
Establish golden set
- 50-100 hand-curated QA pairs for core use case
- Known-good answers you can test against (Part 3: Evals)
- Run on every change
Implement L1 autonomy with tool use
- MCP server with typed schemas (Part 2: Tools)
- User drives, AI suggests
- Log every tool call
Enable prompt caching
- Structure prompts: stable → semi-stable → variable (Part 1: Token Economics)
- Verify hit rates >60%
- If lower, restructure prompts

Exit criteria: You have traces, tests, and cache hit rate >60%.

Days 16-45: Optimization

Goal: Make it fast, cheap, and verified.

Enable speculative decoding
- Draft model proposes, target model verifies (Part 1: Latency)
- Tune draft lengths for your workload
- Expect 1.5-2.5x speedup in memory-bound scenarios
Implement CI/CD quality gates
- Block deploys that fail faithfulness checks (Part 3: Evals as CI)
- Block deploys that regress latency SLOs
- No exceptions
Adopt context compaction
- For sessions >10 turns: summarize to structured facts (Part 1: Context Compaction)
- Drop raw history, keep last 2-3 turns
- Target: 75% token reduction for long sessions
Add hybrid retrieval
- Vector search + BM25 + reranker (Part 2: Hybrid Retrieval)
- The reranker is where quality is won or lost
- Set freshness SLAs per source type

Exit criteria: p95 latency <500ms, quality gates in CI, hybrid retrieval live.

Days 46-90: Advanced Architecture

Goal: Scale safely. Increase autonomy.

Decouple retrieval
- Search stage: small chunks (100-256 tokens) for recall (Part 2: Decoupled Retrieval)
- Retrieve stage: large spans (1024+ tokens) for comprehension
- Mirrors human research: scan many, read deeply
Implement GraphRAG or tool retrieval index
- If entity/relationship queries dominate: GraphRAG
- If agent tool selection at scale: tool retrieval index
- Only if needed; adds governance overhead
Add memory tiers with governance
- Working memory, episodic memory, semantic memory (Part 2: Memory Governance)
- Define: who owns memory, how it updates, when it must be forgotten
- User controls: view, correct, delete, export
Promote to L2/L3 autonomy
- Only after runtime guardrails are verified (Part 4: Runtime Alignment)
- Policy configuration for what’s blocked/flagged/allowed
- Prompt injection defense layers
Establish cost attribution
- Per user, per feature
- Token SLOs with automated fallbacks (Part 1: Token SLOs)
- Breaches trigger alerts or model downgrades

Exit criteria: Memory governance live, L2/L3 autonomy with guardrails, cost attribution per feature.

The Sequencing Principle

Notice the order:

Observability first — you can’t optimize blind
Testing second — you can’t ship without verification
Speed third — fast failures are still failures
Autonomy last — capability without governance is chaos

Teams that invert this order ship fast, break things, and spend months in triage. The sequencing isn’t arbitrary; it’s load-bearing.

The Computer is Built

You now have:

Physics (latency, tokens) that keep humans in the loop
Memory and tools that don’t hallucinate or break things
Verification that catches errors before users
Governance that enforces policy without retraining
A roadmap that sequences investments correctly

The foundation model is the CPU. You’ve built the computer.

Now ship it.

← Part 4: Governance | Series Index

Part of a 6-part series on building production AI systems.

From Blueprint to Build#

90-Day Path to Production#

Days 1-15: Foundation#

Days 16-45: Optimization#

Days 46-90: Advanced Architecture#

The Sequencing Principle#

The Computer is Built#

Navigation#