Semantic Caching: Beyond Simple Key-Value Lookups

    Traditional caching fails with AI because slight variations in prompts produce identical intents. Here's how to implement semantic similarity caching.

    Tob

    Tob

    Backend Developer

    6 min readPerformance
    Semantic Caching: Beyond Simple Key-Value Lookups

    Traditional caching relies on exact key matches. But when users ask "What's the weather?" and "How's the weather today?", they want the same answer. Semantic caching bridges this gap.

    The Problem with Exact Matching

    Consider these queries hitting your AI endpoint:

    • "Explain quantum computing"
    • "What is quantum computing?"
    • "Can you explain quantum computing to me?"

    Each generates a unique cache key, causing redundant API calls despite identical intent.

    Implementing Semantic Cache

    typescript
    import { embeddings } from './ai-client';
    import { vectorStore } from './vector-db';
    
    async function semanticCache<T>(
      query: string,
      threshold: number = 0.92,
      generator: () => Promise<T>
    ): Promise<T> {
      // Generate embedding for the query
      const queryVector = await embeddings.create(query);
      
      // Search for similar cached entries
      const similar = await vectorStore.search(queryVector, {
        threshold,
        limit: 1,
      });
      
      if (similar.length > 0) {
        return similar[0].value as T;
      }
      
      // Cache miss — generate and store
      const result = await generator();
      await vectorStore.insert(queryVector, result);
      
      return result;
    }
    Semantic caching can reduce AI API costs by 40-60% for conversational applications.

    Tuning the Similarity Threshold

    The threshold parameter is critical:

    ThresholdBehaviorUse Case
    0.98+Near-exact matchesLegal/medical queries
    0.92-0.97Similar intentGeneral Q&A
    0.85-0.91Loose matchingCreative applications

    Start conservative (0.95+) and tune based on cache hit quality metrics.

    Related Blog

    Semantic Caching: Beyond Simple Key-Value Lookups | Tob