Semantic Caching: Beyond Simple Key-Value Lookups
Traditional caching fails with AI because slight variations in prompts produce identical intents. Here's how to implement semantic similarity caching.
Tob
Backend Developer
6 min readPerformance
Traditional caching relies on exact key matches. But when users ask "What's the weather?" and "How's the weather today?", they want the same answer. Semantic caching bridges this gap.
The Problem with Exact Matching
Consider these queries hitting your AI endpoint:
- "Explain quantum computing"
- "What is quantum computing?"
- "Can you explain quantum computing to me?"
Each generates a unique cache key, causing redundant API calls despite identical intent.
Implementing Semantic Cache
typescript
import { embeddings } from './ai-client';
import { vectorStore } from './vector-db';
async function semanticCache<T>(
query: string,
threshold: number = 0.92,
generator: () => Promise<T>
): Promise<T> {
// Generate embedding for the query
const queryVector = await embeddings.create(query);
// Search for similar cached entries
const similar = await vectorStore.search(queryVector, {
threshold,
limit: 1,
});
if (similar.length > 0) {
return similar[0].value as T;
}
// Cache miss — generate and store
const result = await generator();
await vectorStore.insert(queryVector, result);
return result;
}Semantic caching can reduce AI API costs by 40-60% for conversational applications.
Tuning the Similarity Threshold
The threshold parameter is critical:
| Threshold | Behavior | Use Case |
|---|---|---|
| 0.98+ | Near-exact matches | Legal/medical queries |
| 0.92-0.97 | Similar intent | General Q&A |
| 0.85-0.91 | Loose matching | Creative applications |
Start conservative (0.95+) and tune based on cache hit quality metrics.
Related Blog
The Day the Code Broke: Claude Code Leaks and the axios Supply Chain Attack
AI Engineering · 5 min read
AI This Week: A Victorian Chatbot, a PyPI Supply Chain Attack, and Text Rendering Magic
AI Engineering · 5 min read
AI Sycophancy and the Rise of Vibe Coding: A Reality Check
AI Engineering · 4 min read