READING TIME · 6 MIN · Jan 10, 2026

Semantic Caching: Beyond Simple Key-Value Lookups

Tob

Backend Developer

Traditional caching fails with AI because slight variations in prompts produce identical intents. Here's how to implement semantic similarity caching.

Performance

Traditional caching relies on exact key matches. But when users ask "What's the weather?" and "How's the weather today?", they want the same answer. Semantic caching bridges this gap.

The Problem with Exact Matching

Consider these queries hitting your AI endpoint:

"Explain quantum computing"
"What is quantum computing?"
"Can you explain quantum computing to me?"

Each generates a unique cache key, causing redundant API calls despite identical intent.

Implementing Semantic Cache

typescript

import { embeddings } from './ai-client';
import { vectorStore } from './vector-db';

async function semanticCache<T>(
  query: string,
  threshold: number = 0.92,
  generator: () => Promise<T>
): Promise<T> {
  // Generate embedding for the query
  const queryVector = await embeddings.create(query);
  
  // Search for similar cached entries
  const similar = await vectorStore.search(queryVector, {
    threshold,
    limit: 1,
  });
  
  if (similar.length > 0) {
    return similar[0].value as T;
  }
  
  // Cache miss — generate and store
  const result = await generator();
  await vectorStore.insert(queryVector, result);
  
  return result;
}