Semantic Caching: Beyond Simple Key-Value Lookups
Traditional caching fails with AI because slight variations in prompts produce identical intents. Here's how to implement semantic similarity caching.
Tob
Backend Developer
6 min readPerformance
Traditional caching relies on exact key matches. But when users ask "What's the weather?" and "How's the weather today?", they want the same answer. Semantic caching bridges this gap.
The Problem with Exact Matching
Consider these queries hitting your AI endpoint:
- "Explain quantum computing"
- "What is quantum computing?"
- "Can you explain quantum computing to me?"
Each generates a unique cache key, causing redundant API calls despite identical intent.
Implementing Semantic Cache
typescript
import { embeddings } from './ai-client';
import { vectorStore } from './vector-db';
async function semanticCache<T>(
query: string,
threshold: number = 0.92,
generator: () => Promise<T>
): Promise<T> {
// Generate embedding for the query
const queryVector = await embeddings.create(query);
// Search for similar cached entries
const similar = await vectorStore.search(queryVector, {
threshold,
limit: 1,
});
if (similar.length > 0) {
return similar[0].value as T;
}
// Cache miss — generate and store
const result = await generator();
await vectorStore.insert(queryVector, result);
return result;
}Semantic caching can reduce AI API costs by 40-60% for conversational applications.
Tuning the Similarity Threshold
The threshold parameter is critical:
| Threshold | Behavior | Use Case |
|---|---|---|
| 0.98+ | Near-exact matches | Legal/medical queries |
| 0.92-0.97 | Similar intent | General Q&A |
| 0.85-0.91 | Loose matching | Creative applications |
Start conservative (0.95+) and tune based on cache hit quality metrics.
Related Blog
AI Roundup: Cursor 3 Goes Multi-Agent, Safetensors Joins PyTorch Foundation, and Agents That Actually Learn
AI Engineering · 5 min read
Cursor 3 and Gemma 4: Two Big Moves Reshaping AI Coding
AI Engineering · 4 min read
Cursor 3 and the Agent-First IDE: What It Means for Developers
AI Engineering · 4 min read