AI AgentsMay 19, 20265 min read

Prompt Caching: How I Think About Cutting AI Agent Costs

A practical note on prompt caching, long context, and why production AI agents need cost design before they need more clever prompts.

Prompt CachingAI AgentsCost Optimization

Prompt caching blog cover with AI workflow blocks and orange accents

The first time you build an AI agent, the exciting part is usually the magic moment: it reads something, calls a tool, writes an answer, and feels almost like a teammate. The less exciting part arrives later, when the same workflow starts running every day and the API bill becomes part of the product design.

That is why prompt caching caught my attention. It is not just a pricing trick. It changes how I think about agent architecture. If an agent keeps sending the same instructions, same tool definitions, same policy notes, and same background context again and again, paying full price every run feels like recompiling the same project from scratch every time you change one line.

The simple idea

In many agent workflows, only a small part of the input changes: the user request, the latest email, the new document, the current row in a spreadsheet. A large part stays stable: the system instructions, available tools, output format, business rules, safety rules, and examples. Prompt caching lets the model provider reuse that stable prefix instead of charging and processing it as if it were brand new every time.

Keep stable instructions stable. Do not rebuild the prompt differently on every call.
Put reusable context before volatile context so the cache has a clean prefix to reuse.
Avoid random timestamps, request IDs, and noisy logs inside the cached section.
Measure cost per successful task, not only cost per token.

What this means for automation work

For client systems, prompt caching is most useful when the workflow repeats: support triage, CRM enrichment, invoice review, lead research, content repurposing, document QA, or internal knowledge assistants. These systems often use the same playbook hundreds of times. Once the playbook is stable, caching can reduce both cost and latency.

The small catch is that caching rewards discipline. If the prompt is built by concatenating random snippets in random order, it will miss cache opportunities. If the agent loads too much irrelevant context, the cache may help the bill but not the quality. Good agent work is still good software work: separate stable config from runtime data, keep interfaces predictable, and log the right metrics.

My current rule: before optimizing prompts, map which parts of the context are stable, which parts change, and which parts should not be in the context at all.

Keep reading

AI guardrails blog cover with shield, lock, and automation warnings

AI Safety6 min read

AI Guardrails Are Not Optional Anymore

Read article

System prompts blog cover showing policy, tools, and memory blocks

AI Agents6 min read

System Prompts: The Small File That Makes AI Agents Work Better

Read article