READING TIME · 12 MIN · Feb 25, 2026

Building AI Agents: Patterns and Anti-Patterns (A Production Guide)

Tob

Backend Developer

Most AI agents fail in production because of architecture mistakes, not LLM limitations. Here are the 5 patterns that work and the 5 anti-patterns that will break your system.

AI Engineering

TL;DR: Building AI agents that work in demos is easy. Building ones that survive production is hard. The difference isn't the LLM — it's the architecture around it. This guide covers the 5 patterns that make agents reliable and the 5 anti-patterns that will silently destroy your system.

---

What is an AI Agent? (And Why Most Fail in Production)

When people talk about building AI agents, they usually mean a system where an LLM doesn't just respond to a prompt — it takes actions, uses tools, makes decisions, and loops until a goal is reached. Unlike a chatbot that answers one question at a time, an autonomous AI agent can browse the web, query databases, write and run code, and chain dozens of steps together without hand-holding.

The concept sounds powerful. And it is — when it works.

The problem: most teams prototype an agent in an afternoon, ship it, and then spend the next month firefighting. The agent enters infinite loops. It spends $200 on OpenAI tokens trying to answer a question that should take 3 API calls. It silently returns wrong answers with complete confidence. It crashes with no log to explain what happened.

None of these failures come from the LLM being "not good enough." They come from architecture mistakes that are entirely avoidable. The patterns in this guide are drawn from real production deployments — systems handling thousands of agent runs per day — not just toy examples.

---

The Core Agent Loop — How It Actually Works

Observe → Decide → Act → Repeat

Every AI agent, regardless of framework or LLM, runs the same fundamental loop:

Observe — gather the current state: user goal, prior results, available tools, memory
Decide — the LLM picks the next action (call a tool, ask a clarifying question, return a result)
Act — execute the chosen action (run the tool, write to memory, call an API)
Repeat — feed the result back as a new observation, loop until done

The loop sounds simple. The complexity lives in what you do around it: how you validate inputs and outputs, how you handle failures, how you prevent runaway loops, and how you give humans visibility into what's happening.

The Agent Loop — Mermaid Diagram

flowchart TD A([Start: User Goal]) --> B[Observe Current State] B --> C{LLM Decides Next Action} C --> D[Call Tool / API] C --> E[Ask Clarifying Question] C --> F[Return Final Answer] D --> G[Tool Result] G --> H{Valid Result?} H -- Yes --> I[Update Memory / State] H -- No --> J[Handle Error / Retry] J --> B I --> K{Goal Reached?} K -- No --> B K -- Yes --> F E --> L[Wait for Human Input] L --> B F --> M([End]) style A fill:#22c55e,color:#fff style M fill:#22c55e,color:#fff style J fill:#ef4444,color:#fff

Notice what's missing from most naive implementations: the error handling branch (J), the validity check (H), and the human-in-the-loop path (E → L). Those three omissions account for ~80% of production agent failures.

---

5 Proven Patterns for Reliable AI Agents

These patterns apply regardless of framework. The code examples use Python 3.11 and LangChain 0.2+, but the concepts translate to any stack.

Pattern 1 — Tool Isolation

Every tool your agent calls should be a pure, bounded function with:

A single, well-defined responsibility
Explicit input validation
A predictable output schema
Its own error handling

The mistake most teams make: one massive execute_task() tool that does database queries, file I/O, and API calls all in one function. When it fails, you have no idea where. When the LLM calls it wrong, the blast radius is huge.

python

# Python 3.11+ | LangChain 0.2+
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Optional
import httpx

class SearchInput(BaseModel):
    query: str = Field(description="Search query, max 200 chars")
    max_results: int = Field(default=5, ge=1, le=20)

@tool(args_schema=SearchInput)
def web_search(query: str, max_results: int = 5) -> dict:
    """Search the web and return structured results. Use for factual lookups only."""
    if len(query) > 200:
        raise ValueError(f"Query too long: {len(query)} chars (max 200)")
    
    try:
        response = httpx.get(
            "https://api.search-provider.com/search",
            params={"q": query, "limit": max_results},
            timeout=10.0,
        )
        response.raise_for_status()
        return {"results": response.json()["items"], "query": query}
    except httpx.TimeoutException:
        return {"error": "Search timed out after 10s", "results": []}
    except httpx.HTTPStatusError as e:
        return {"error": f"Search API returned {e.response.status_code}", "results": []}

Notice: the tool returns a structured dict even on error. It never throws an unhandled exception back to the agent loop. The LLM can handle a {"error": "...", "results": []} gracefully; an unhandled exception crashes the entire run.

Pattern 2 — Circuit Breaker

An agent that calls a failing external service in a tight loop will rack up costs and latency fast. A circuit breaker stops the bleeding.

The pattern: track failure counts per tool. After N consecutive failures, "open" the circuit — refuse calls to that tool for a cooldown period. After the cooldown, let one call through to test if the service recovered.

python

# Python 3.11+
import time
from dataclasses import dataclass, field
from typing import Callable, Any

@dataclass
class CircuitBreaker:
    failure_threshold: int = 3
    cooldown_seconds: int = 60
    _failures: int = field(default=0, init=False)
    _opened_at: Optional[float] = field(default=None, init=False)

    def is_open(self) -> bool:
        if self._opened_at is None:
            return False
        elapsed = time.monotonic() - self._opened_at
        if elapsed > self.cooldown_seconds:
            # Half-open: allow one test call
            self._opened_at = None
            self._failures = 0
            return False
        return True

    def record_success(self):
        self._failures = 0
        self._opened_at = None

    def record_failure(self):
        self._failures += 1
        if self._failures >= self.failure_threshold:
            self._opened_at = time.monotonic()

    def call(self, fn: Callable, *args, **kwargs) -> Any:
        if self.is_open():
            raise RuntimeError(
                f"Circuit open — service unavailable (retry after {self.cooldown_seconds}s)"
            )
        try:
            result = fn(*args, **kwargs)
            self.record_success()
            return result
        except Exception as e:
            self.record_failure()
            raise

# Usage
db_breaker = CircuitBreaker(failure_threshold=3, cooldown_seconds=30)

def query_database(sql: str) -> list:
    return db_breaker.call(_raw_db_query, sql)

In production at one team I know, adding circuit breakers to their agent's external tool calls reduced runaway-cost incidents by 90% in the first month.

Pattern 3 — Structured Output Validation

LLMs are probabilistic. Even with a perfect prompt, an LLM will occasionally return malformed JSON, skip required fields, or hallucinate values that violate your schema. Never let raw LLM output hit your business logic without validation.

Use Pydantic (v2) to define what you expect, and validate every LLM response before acting on it:

python

# Python 3.11+ | LangChain 0.2+ | Pydantic 2.x
from pydantic import BaseModel, field_validator
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate

class AgentAction(BaseModel):
    tool_name: str
    tool_input: dict
    reasoning: str
    confidence: float

    @field_validator("confidence")
    @classmethod
    def confidence_must_be_valid(cls, v: float) -> float:
        if not 0.0 <= v <= 1.0:
            raise ValueError(f"Confidence must be 0-1, got {v}")
        return v

    @field_validator("tool_name")
    @classmethod
    def tool_must_exist(cls, v: str) -> str:
        allowed = {"web_search", "read_file", "write_file", "query_db"}
        if v not in allowed:
            raise ValueError(f"Unknown tool: {v}. Allowed: {allowed}")
        return v

parser = PydanticOutputParser(pydantic_object=AgentAction)
llm = ChatOpenAI(model="gpt-4o", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an agent. Respond with valid JSON.\n{format_instructions}"),
    ("human", "{task}"),
]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser

try:
    action: AgentAction = chain.invoke({"task": "Find the latest Python version"})
except Exception as e:
    # Validation failed — log it, don't blindly retry
    print(f"LLM output invalid: {e}")
    action = AgentAction(
        tool_name="web_search",
        tool_input={"query": "latest Python version"},
        reasoning="Fallback after parse failure",
        confidence=0.5,
    )

The fallback action on parse failure matters. Don't crash — degrade gracefully and log the failure for debugging.

Pattern 4 — Memory Separation (Short vs Long-term)

Agent memory comes in two distinct forms, and mixing them is a common source of bugs:

Short-term (working memory): The current conversation context, tool results from this run, intermediate calculations. Lives in the LLM's context window. Cleared between runs.
Long-term (persistent memory): User preferences, learned facts, previous task outcomes. Must be stored externally (vector DB, key-value store, relational DB).

Treating both as the same thing causes bloated context windows (and costs — GPT-4o charges per token), stale data in new runs, and context-window-exceeded errors on long tasks.

python

# Python 3.11+ | LangChain 0.2+
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.messages import HumanMessage, AIMessage
from typing import List

class AgentMemory:
    def __init__(self, session_id: str):
        self.session_id = session_id
        # Short-term: in-process list, cleared per run
        self._working: List = []
        # Long-term: persisted to vector store
        self._long_term = Chroma(
            collection_name=f"agent_{session_id}",
            embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
            persist_directory="./chroma_db",
        )

    def add_working(self, message):
        """Add to short-term working memory (current run only)."""
        self._working.append(message)
        # Keep last 20 messages to avoid context overflow
        if len(self._working) > 20:
            self._working = self._working[-20:]

    def remember(self, fact: str, metadata: dict = None):
        """Persist important facts to long-term memory."""
        self._long_term.add_texts(
            texts=[fact],
            metadatas=[{**(metadata or {}), "session": self.session_id}],
        )

    def recall(self, query: str, k: int = 3) -> List[str]:
        """Retrieve relevant long-term memories for current context."""
        docs = self._long_term.similarity_search(query, k=k)
        return [doc.page_content for doc in docs]

    def clear_working(self):
        """Reset working memory between agent runs."""
        self._working = []

Pattern 5 — Human-in-the-Loop Checkpoints

Fully autonomous AI agents are rarely appropriate in production. For any action that is irreversible (sending an email, deleting data, making a purchase, deploying code), insert a human approval checkpoint.

The pattern: classify actions by risk level. Low-risk actions (reading data, searching the web) proceed automatically. Medium-risk actions get logged for async review. High-risk actions block until a human explicitly approves.

python

# Python 3.11+
from enum import Enum
from dataclasses import dataclass

class RiskLevel(Enum):
    LOW = "low"       # auto-proceed
    MEDIUM = "medium" # log + proceed, flag for review
    HIGH = "high"     # block until approved

@dataclass
class PendingAction:
    action_id: str
    tool_name: str
    tool_input: dict
    risk: RiskLevel
    approved: bool = False

TOOL_RISK_MAP = {
    "web_search": RiskLevel.LOW,
    "read_file": RiskLevel.LOW,
    "query_db": RiskLevel.MEDIUM,
    "send_email": RiskLevel.HIGH,
    "delete_record": RiskLevel.HIGH,
    "deploy_code": RiskLevel.HIGH,
}

def should_proceed(action: PendingAction) -> bool:
    if action.risk == RiskLevel.LOW:
        return True
    if action.risk == RiskLevel.MEDIUM:
        log_for_review(action)
        return True
    # HIGH: block and wait for human approval
    notify_human(action)
    return wait_for_approval(action.action_id, timeout_seconds=300)

This pattern alone has prevented more production incidents than any other change teams make when moving from prototype to production.

---

5 Anti-Patterns That Will Break Your Agent

Anti-Pattern 1 — Unbounded Loops

An agent with no iteration limit will loop forever if the LLM keeps deciding "not done yet." This is the most common cause of runaway API costs.

The fix: Always set hard limits — maximum iterations, maximum tokens consumed, and maximum wall-clock time. Check all three, not just one.

python

MAX_ITERATIONS = 15
MAX_TOKENS = 50_000
MAX_SECONDS = 120

if state.iterations >= MAX_ITERATIONS:
    raise AgentTimeoutError(f"Exceeded {MAX_ITERATIONS} iterations")

Anti-Pattern 2 — God Tool (Tool That Does Everything)

A single tool that accepts arbitrary commands and routes them internally is a debugging nightmare. When the agent misuses it, you have no visibility into which sub-operation failed. The LLM also struggles to use it correctly — the more a tool does, the harder it is to describe in a way the LLM understands.

The fix: One tool, one job. Ten small tools are better than one large one.

Anti-Pattern 3 — Trusting LLM Output Blindly

LLMs hallucinate. They return confident, well-formatted JSON with wrong values. They invent API parameter names. They claim success when a tool actually failed. If you pass raw LLM output directly to a database insert or API call, you will have a bad day.

The fix: Validate every LLM output with Pydantic before acting on it (see Pattern 3). Treat LLM output like user input — never trusted by default.

Anti-Pattern 4 — No Observability

An agent run that fails silently is worse than one that fails loudly. Without structured logging of each step, tool call, and decision, debugging a production issue means guessing.

The fix: Emit a structured log event for every agent action:

python

import structlog
import time

log = structlog.get_logger()

def traced_tool_call(tool_name: str, inputs: dict, run_id: str) -> dict:
    start = time.monotonic()
    log.info("tool.call.start", tool=tool_name, inputs=inputs, run_id=run_id)
    try:
        result = TOOL_REGISTRY[tool_name](**inputs)
        elapsed_ms = (time.monotonic() - start) * 1000
        log.info(
            "tool.call.success",
            tool=tool_name,
            run_id=run_id,
            elapsed_ms=round(elapsed_ms, 1),
        )
        return result
    except Exception as e:
        elapsed_ms = (time.monotonic() - start) * 1000
        log.error(
            "tool.call.failure",
            tool=tool_name,
            run_id=run_id,
            error=str(e),
            elapsed_ms=round(elapsed_ms, 1),
        )
        raise

Ship this from day one. Retroactively adding observability to a production agent is painful.

Anti-Pattern 5 — Stateless Agents

An agent that forgets everything between runs can't learn from mistakes, can't resume interrupted tasks, and forces users to repeat context every time. This isn't just a UX problem — it means the agent is structurally incapable of handling multi-session workflows.

The fix: Persist state. At minimum, store the task goal, completed steps, and any retrieved facts in a database keyed by session ID. On resume, load this state before starting the agent loop.

---

Production Architecture — A Real-World Example

Here's what a production-grade AI agent architecture looks like when all five patterns are applied:

Production Architecture Diagram

flowchart TB subgraph Ingress U([User / API]) --> GW[API Gateway] GW --> RQ[Request Queue] end subgraph AgentRuntime["Agent Runtime"] RQ --> AL[Agent Loop] AL --> MM[Memory Manager] MM --> WM[(Working Memory\nin-process)] MM --> LTM[(Long-term Memory\nVector DB)] AL --> TD[Tool Dispatcher] TD --> CB{Circuit Breaker} CB --> T1[web_search] CB --> T2[query_db] CB --> T3[read_file] CB --> T4[send_email\nHITL required] end subgraph Validation AL --> OV[Output Validator\nPydantic] OV -- valid --> NXT[Next Action] OV -- invalid --> ERR[Error Handler\n+ Fallback] end subgraph Observability AL --> OBS[Structured Logger\nstructlog] OBS --> LS[(Log Store\nOpenSearch)] OBS --> MT[Metrics\nPrometheus] end subgraph HumanInTheLoop T4 --> HP[Human Approval\nWebhook] HP --> HU([Reviewer]) HU -- approve/reject --> HP HP --> T4 end AL --> RS([Result / Response])

Code Example — Full Agent Run

python

# Python 3.11+ | LangChain 0.2+ | structlog 24.x
import uuid
import time
import structlog
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

log = structlog.get_logger()

MAX_ITERATIONS = 15
MAX_SECONDS = 120

def run_agent(goal: str, session_id: str | None = None) -> dict:
    run_id = str(uuid.uuid4())
    session_id = session_id or run_id
    start_time = time.monotonic()

    log.info("agent.run.start", run_id=run_id, session_id=session_id, goal=goal)

    memory = AgentMemory(session_id)
    prior_context = memory.recall(goal, k=3)

    tools = [web_search]  # register isolated, validated tools here

    llm = ChatOpenAI(model="gpt-4o", temperature=0, max_tokens=4096)

    prompt = ChatPromptTemplate.from_messages([
        ("system", (
            "You are a reliable research agent. Complete the user's goal using available tools. "
            "Stop when you have a confident answer. Never loop more than necessary.\n\n"
            "Prior context from memory:\n{prior_context}"
        )),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ])

    agent = create_tool_calling_agent(llm, tools, prompt)
    executor = AgentExecutor(
        agent=agent,
        tools=tools,
        max_iterations=MAX_ITERATIONS,
        verbose=False,
        handle_parsing_errors=True,
    )

    try:
        result = executor.invoke({
            "input": goal,
            "prior_context": "\n".join(prior_context) or "None",
        })

        elapsed = (time.monotonic() - start_time)
        if elapsed > MAX_SECONDS:
            log.warning("agent.run.slow", run_id=run_id, elapsed_s=round(elapsed, 1))

        # Persist key findings to long-term memory
        if result.get("output"):
            memory.remember(f"Goal: {goal}\nAnswer: {result['output'][:500]}")

        log.info("agent.run.success", run_id=run_id, elapsed_s=round(elapsed, 1))
        return {"run_id": run_id, "output": result["output"], "success": True}

    except Exception as e:
        elapsed = (time.monotonic() - start_time)
        log.error("agent.run.failure", run_id=run_id, error=str(e), elapsed_s=round(elapsed, 1))
        return {"run_id": run_id, "output": None, "success": False, "error": str(e)}

---

Trade-offs and When NOT to Use Agents

AI agents are powerful, but they're not the right tool for every problem. Be honest about the costs:

Latency: A single LLM call takes 1-5 seconds. An agent that makes 10 tool calls before returning an answer takes 10-50 seconds. For user-facing features with sub-second latency requirements, agents are wrong. A direct RAG pipeline or a single well-crafted prompt will be faster and cheaper.

Cost: Each agent iteration burns tokens. A 10-step agent run using GPT-4o with 2,000 tokens per step costs roughly $0.60 per run. At 1,000 runs/day, that's $600/day — before tool API costs. Model these costs before committing to an agentic architecture.

Unpredictability: Even with all the patterns above, agents are harder to test deterministically than regular code. Temperature=0 helps but doesn't guarantee identical outputs. If your business logic requires exact, reproducible behavior, agents introduce unnecessary risk.

When agents are the right choice:

Tasks that genuinely require multi-step reasoning and dynamic tool selection
Workflows where the exact steps aren't known in advance
Research or analysis tasks where the agent can explore and adapt
Long-running background tasks where latency doesn't matter

When to skip agents:

The workflow has fixed, predictable steps → use a pipeline or workflow engine
Latency is critical → use a single LLM call with a well-designed prompt
The task is classification or extraction → fine-tune a smaller model
You need 100% reproducibility → agents are wrong

---

Frequently Asked Questions

What's the difference between an AI agent and a chatbot?

A chatbot responds to one message at a time and has no ability to take actions in external systems. An AI agent operates in a loop: it receives a goal, decides what actions to take, calls tools (APIs, databases, code execution), observes the results, and repeats until the goal is complete. Agents are stateful, multi-step, and action-oriented. Chatbots are stateless, single-turn, and response-oriented.

How do I prevent infinite loops in AI agents?

Three guards, applied together:

Iteration limit: hard cap on the number of loop cycles (e.g., max_iterations=15)
Token budget: track total tokens consumed and stop when over threshold
Time limit: wall-clock timeout (e.g., 120 seconds) that kills the run regardless of progress

Don't rely on any single guard. An agent can stay under the iteration limit but consume enormous tokens per step. All three together form a reliable safety net.

What tools should an AI agent have access to?

Follow the principle of least privilege: give the agent only the tools it needs for its specific domain, nothing more. A customer support agent doesn't need file system access. A research agent doesn't need the ability to send emails.

For each tool, ask: "What's the blast radius if the LLM misuses this tool?" High blast-radius tools (delete, send, deploy) should require human-in-the-loop approval. Low blast-radius tools (read, search, query) can be auto-approved. Keep the total tool count under 10 — beyond that, LLM tool-selection accuracy measurably degrades.

---

Building AI Agents: Patterns and Anti-Patterns (A Production Guide)

What is an AI Agent? (And Why Most Fail in Production)

The Core Agent Loop — How It Actually Works

Observe → Decide → Act → Repeat

The Agent Loop — Mermaid Diagram

5 Proven Patterns for Reliable AI Agents

Pattern 1 — Tool Isolation

Pattern 2 — Circuit Breaker

Pattern 3 — Structured Output Validation

Pattern 4 — Memory Separation (Short vs Long-term)

Pattern 5 — Human-in-the-Loop Checkpoints

5 Anti-Patterns That Will Break Your Agent

Anti-Pattern 1 — Unbounded Loops

Anti-Pattern 2 — God Tool (Tool That Does Everything)

Anti-Pattern 3 — Trusting LLM Output Blindly

Anti-Pattern 4 — No Observability

Anti-Pattern 5 — Stateless Agents

Production Architecture — A Real-World Example

Production Architecture Diagram

Code Example — Full Agent Run

Trade-offs and When NOT to Use Agents

Frequently Asked Questions

What's the difference between an AI agent and a chatbot?

How do I prevent infinite loops in AI agents?

What tools should an AI agent have access to?

Further Reading

Related Blog

Agentic AI: The Complete Guide to Autonomous AI Systems in 2026

AI Agent Design Patterns: Panduan Lengkap Memilih Arsitektur yang Tepat

AI This Week: A Victorian Chatbot, a PyPI Supply Chain Attack, and Text Rendering Magic