Streaming LLM Responses: From Tokens to UI

End-to-end patterns for implementing real-time streaming from language models to user interfaces with proper error handling.

Tob

Backend Developer

December 15, 20259 min readFrontend

Streaming LLM Responses: From Tokens to UI

Users expect real-time feedback. Here's how to stream tokens from model to browser.

Server-Side Streaming

typescript

async function* streamCompletion(prompt: string) {
  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) yield content;
  }
}

Client-Side Consumption

typescript

function useStreamingResponse(endpoint: string) {
  const [content, setContent] = useState('');
  
  const stream = async (prompt: string) => {
    const response = await fetch(endpoint, {
      method: 'POST',
      body: JSON.stringify({ prompt }),
    });
    
    const reader = response.body?.getReader();
    const decoder = new TextDecoder();
    
    while (reader) {
      const { done, value } = await reader.read();
      if (done) break;
      setContent(prev => prev + decoder.decode(value));
    }
  };
  
  return { content, stream };
}

Streaming reduces perceived latency by 80% even when total response time is unchanged.

Streaming LLM React

Related Blog

April 1, 2026

How to Start Development with Agentic Workflow

Tutorials · 12 min read

March 20, 2026

Agentic AI: The Complete Guide to Autonomous AI Systems in 2026

AI Engineering · 14 min read

February 27, 2026

Series: Mengenal Coding Agents: Era Baru Software Engineering

Series - Coding Agents · 8 Parts Series