Latest issue

TTL Is the Wrong Abstraction for LLM Response Caching

7/27/2026

A team ships semantic caching. Cache hit rates climb to 30%. Costs drop. Everyone's happy — until a user asks "what's our refund policy?" and gets an answer that was accurate six months ago, before the policy changed. The cache TTL was set to 24 hours. The policy changed at 9am.…

Read latest post

Recent posts

7/20/2026
Your Fallback Chain Needs a Threat Model, Not a Priority List
The 3am call comes in. Your AI feature is returning 500s. You check the logs, confirm it's an OpenAI outage, and feel briefly smug — you built a fallback chain six months ago. Then…
7/13/2026
You're Paying for Tokens You Never Counted
The Monday morning bill shock is a rite of passage. You shipped the feature Friday, traffic came in over the weekend, and now you're staring at a number that's 3–5× what your back-…
7/6/2026
P99 TTFT Is the Only Latency Number That Will Save You at 2am
Four seconds. That's how long a user stares at a blank screen before they decide your AI feature is broken. Not slow — broken. They don't know about prefill phases or KV cache tran…
6/29/2026
The Cache Key Is the Product: Why LLM Response Caching Fails Silently
The incident is always the same. A customer asks "where's my delivery?" and gets a confident, detailed answer — about someone else's order. Two users asked semantically similar que…
6/22/2026
Your Fallback Chain Is a Retry Loop in a Trench Coat
The April incidents should have been a wake-up call. A ten-hour Claude outage on April 6 and a major OpenAI platform outage on April 20 — both multi-hour, both affecting production…
6/15/2026
Latency Spikes Are the Last Thing You'll Notice — Until They're the Only Thing Users Talk About
The support ticket reads: "The AI feels slow and dumb lately." No stack trace. No error code. Just a user who noticed something your dashboards didn't. This is the failure mode tha…
6/8/2026
Your LLM Bill Is a Lie Until You Track Tokens at the Span Level
The invoice arrives and the number is wrong — not wrong as in fraudulent, wrong as in useless. It tells you what you spent across the whole month. It doesn't tell you that one agen…
6/1/2026
LLM Output Quality Is Degrading Right Now. You Just Don't Know It Yet.
The incident report from a real Friday-night production failure reads like a horror story in slow motion: a customer support agent launched Monday, by Friday the inbox was full of…
5/25/2026
Your RAG Pipeline's Retrieval Quality Is Decided Before the First Query Runs
The incident report always reads the same way. The LLM cited a policy. The policy didn't exist. What actually happened: the chunker split two adjacent sections at an arbitrary boun…
5/18/2026
The Routing Decision You're Making Wrong: Local vs. API LLMs
Most teams treat the local-vs-API decision as a one-time architecture call. Pick a side, commit, move on. That's the wrong frame — and it's why so many teams end up either hemorrha…
5/11/2026
AI Agents Don't Fail at the Model Layer. They Fail at the Seams.
The demo works. The agent researches a company, drafts a personalized email, and the team ships it. Three weeks later, you're getting paged because the agent is stuck in a retry lo…
5/4/2026
Prompts Outgrow Your Codebase Before You Notice
The incident is always the same. Someone makes a small prompt edit — two lines, maybe a single character — and three days later you're manually tracing why a specific customer's ou…
4/27/2026
The Gap Between AI Demo and AI Deployed Is Operational, Not Technical
Most AI features don't stall because the model is wrong. They stall because the team around the model isn't set up to catch when it goes wrong. I've watched this pattern repeat: a…
4/20/2026
Your QA Pipeline Was Built for the Wrong Kind of Software
A fintech company deployed a customer support agent in February 2026. It passed every test in their CI/CD pipeline — unit tests, integration tests, end-to-end validation across ten…
4/15/2026
Your Test Suite Is Lying to You About Your AI System's Quality
The fintech team thought they'd done everything right. Unit tests on every tool function. Integration tests on all the API connections. End-to-end tests confirming the agent handle…
4/6/2026
Your Model Degraded Three Weeks Ago. You Found Out Yesterday.
That's not a hypothetical. It's the default outcome when teams treat model quality like uptime — something you check when users complain. The problem is structural. Traditional obs…
3/29/2026
Your Demo Worked Perfectly. Your Production System Is a Different Animal.
The demo crushed it. The founder showed the model handling edge cases, the latency looked snappy, the outputs were coherent. Everyone in the room was impressed. Six weeks later, th…
3/15/2026
Your Model Serving Stack Is a Confession
Ask a team how they're serving models in production and you'll learn more about their engineering culture in five minutes than in any architecture review. Not because there's one r…