Building AI Agents: Patterns and Anti-Patterns (A Production Guide)

    Most AI agents fail in production because of architecture mistakes, not LLM limitations. Here are the 5 patterns that work and the 5 anti-patterns that will break your system.

    Tob

    Tob

    Backend Developer

    12 min readAI Engineering
    Building AI Agents: Patterns and Anti-Patterns (A Production Guide)

    TL;DR: Building AI agents that work in demos is easy. Building ones that survive production is hard. The difference isn't the LLM — it's the architecture around it. This guide covers the 5 patterns that make agents reliable and the 5 anti-patterns that will silently destroy your system.

    ---

    What is an AI Agent? (And Why Most Fail in Production)

    When people talk about building AI agents, they usually mean a system where an LLM doesn't just respond to a prompt — it takes actions, uses tools, makes decisions, and loops until a goal is reached. Unlike a chatbot that answers one question at a time, an autonomous AI agent can browse the web, query databases, write and run code, and chain dozens of steps together without hand-holding.

    The concept sounds powerful. And it is — when it works.

    The problem: most teams prototype an agent in an afternoon, ship it, and then spend the next month firefighting. The agent enters infinite loops. It spends $200 on OpenAI tokens trying to answer a question that should take 3 API calls. It silently returns wrong answers with complete confidence. It crashes with no log to explain what happened.

    None of these failures come from the LLM being "not good enough." They come from architecture mistakes that are entirely avoidable. The patterns in this guide are drawn from real production deployments — systems handling thousands of agent runs per day — not just toy examples.

    ---

    The Core Agent Loop — How It Actually Works

    Observe → Decide → Act → Repeat

    Every AI agent, regardless of framework or LLM, runs the same fundamental loop:

    1. Observe — gather the current state: user goal, prior results, available tools, memory
    2. Decide — the LLM picks the next action (call a tool, ask a clarifying question, return a result)
    3. Act — execute the chosen action (run the tool, write to memory, call an API)
    4. Repeat — feed the result back as a new observation, loop until done

    The loop sounds simple. The complexity lives in what you do around it: how you validate inputs and outputs, how you handle failures, how you prevent runaway loops, and how you give humans visibility into what's happening.

    The Agent Loop — Mermaid Diagram

    flowchart TD A([Start: User Goal]) --> B[Observe Current State] B --> C{LLM Decides Next Action} C --> D[Call Tool / API] C --> E[Ask Clarifying Question] C --> F[Return Final Answer] D --> G[Tool Result] G --> H{Valid Result?} H -- Yes --> I[Update Memory / State] H -- No --> J[Handle Error / Retry] J --> B I --> K{Goal Reached?} K -- No --> B K -- Yes --> F E --> L[Wait for Human Input] L --> B F --> M([End]) style A fill:#22c55e,color:#fff style M fill:#22c55e,color:#fff style J fill:#ef4444,color:#fff

    Notice what's missing from most naive implementations: the error handling branch (J), the validity check (H), and the human-in-the-loop path (E → L). Those three omissions account for ~80% of production agent failures.

    ---

    5 Proven Patterns for Reliable AI Agents

    These patterns apply regardless of framework. The code examples use Python 3.11 and LangChain 0.2+, but the concepts translate to any stack.

    Pattern 1 — Tool Isolation

    Every tool your agent calls should be a pure, bounded function with:

    • A single, well-defined responsibility
    • Explicit input validation
    • A predictable output schema
    • Its own error handling

    The mistake most teams make: one massive execute_task() tool that does database queries, file I/O, and API calls all in one function. When it fails, you have no idea where. When the LLM calls it wrong, the blast radius is huge.

    python
    # Python 3.11+ | LangChain 0.2+
    from langchain_core.tools import tool
    from pydantic import BaseModel, Field
    from typing import Optional
    import httpx
    
    class SearchInput(BaseModel):
        query: str = Field(description="Search query, max 200 chars")
        max_results: int = Field(default=5, ge=1, le=20)
    
    @tool(args_schema=SearchInput)
    def web_search(query: str, max_results: int = 5) -> dict:
        """Search the web and return structured results. Use for factual lookups only."""
        if len(query) > 200:
            raise ValueError(f"Query too long: {len(query)} chars (max 200)")
        
        try:
            response = httpx.get(
                "https://api.search-provider.com/search",
                params={"q": query, "limit": max_results},
                timeout=10.0,
            )
            response.raise_for_status()
            return {"results": response.json()["items"], "query": query}
        except httpx.TimeoutException:
            return {"error": "Search timed out after 10s", "results": []}
        except httpx.HTTPStatusError as e:
            return {"error": f"Search API returned {e.response.status_code}", "results": []}

    Notice: the tool returns a structured dict even on error. It never throws an unhandled exception back to the agent loop. The LLM can handle a {"error": "...", "results": []} gracefully; an unhandled exception crashes the entire run.

    Pattern 2 — Circuit Breaker

    An agent that calls a failing external service in a tight loop will rack up costs and latency fast. A circuit breaker stops the bleeding.

    The pattern: track failure counts per tool. After N consecutive failures, "open" the circuit — refuse calls to that tool for a cooldown period. After the cooldown, let one call through to test if the service recovered.

    python
    # Python 3.11+
    import time
    from dataclasses import dataclass, field
    from typing import Callable, Any
    
    @dataclass
    class CircuitBreaker:
        failure_threshold: int = 3
        cooldown_seconds: int = 60
        _failures: int = field(default=0, init=False)
        _opened_at: Optional[float] = field(default=None, init=False)
    
        def is_open(self) -> bool:
            if self._opened_at is None:
                return False
            elapsed = time.monotonic() - self._opened_at
            if elapsed > self.cooldown_seconds:
                # Half-open: allow one test call
                self._opened_at = None
                self._failures = 0
                return False
            return True
    
        def record_success(self):
            self._failures = 0
            self._opened_at = None
    
        def record_failure(self):
            self._failures += 1
            if self._failures >= self.failure_threshold:
                self._opened_at = time.monotonic()
    
        def call(self, fn: Callable, *args, **kwargs) -> Any:
            if self.is_open():
                raise RuntimeError(
                    f"Circuit open — service unavailable (retry after {self.cooldown_seconds}s)"
                )
            try:
                result = fn(*args, **kwargs)
                self.record_success()
                return result
            except Exception as e:
                self.record_failure()
                raise
    
    # Usage
    db_breaker = CircuitBreaker(failure_threshold=3, cooldown_seconds=30)
    
    def query_database(sql: str) -> list:
        return db_breaker.call(_raw_db_query, sql)

    In production at one team I know, adding circuit breakers to their agent's external tool calls reduced runaway-cost incidents by 90% in the first month.

    Pattern 3 — Structured Output Validation

    LLMs are probabilistic. Even with a perfect prompt, an LLM will occasionally return malformed JSON, skip required fields, or hallucinate values that violate your schema. Never let raw LLM output hit your business logic without validation.

    Use Pydantic (v2) to define what you expect, and validate every LLM response before acting on it:

    python
    # Python 3.11+ | LangChain 0.2+ | Pydantic 2.x
    from pydantic import BaseModel, field_validator
    from langchain_openai import ChatOpenAI
    from langchain_core.output_parsers import PydanticOutputParser
    from langchain_core.prompts import ChatPromptTemplate
    
    class AgentAction(BaseModel):
        tool_name: str
        tool_input: dict
        reasoning: str
        confidence: float
    
        @field_validator("confidence")
        @classmethod
        def confidence_must_be_valid(cls, v: float) -> float:
            if not 0.0 <= v <= 1.0:
                raise ValueError(f"Confidence must be 0-1, got {v}")
            return v
    
        @field_validator("tool_name")
        @classmethod
        def tool_must_exist(cls, v: str) -> str:
            allowed = {"web_search", "read_file", "write_file", "query_db"}
            if v not in allowed:
                raise ValueError(f"Unknown tool: {v}. Allowed: {allowed}")
            return v
    
    parser = PydanticOutputParser(pydantic_object=AgentAction)
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an agent. Respond with valid JSON.\n{format_instructions}"),
        ("human", "{task}"),
    ]).partial(format_instructions=parser.get_format_instructions())
    
    chain = prompt | llm | parser
    
    try:
        action: AgentAction = chain.invoke({"task": "Find the latest Python version"})
    except Exception as e:
        # Validation failed — log it, don't blindly retry
        print(f"LLM output invalid: {e}")
        action = AgentAction(
            tool_name="web_search",
            tool_input={"query": "latest Python version"},
            reasoning="Fallback after parse failure",
            confidence=0.5,
        )

    The fallback action on parse failure matters. Don't crash — degrade gracefully and log the failure for debugging.

    Pattern 4 — Memory Separation (Short vs Long-term)

    Agent memory comes in two distinct forms, and mixing them is a common source of bugs:

    • Short-term (working memory): The current conversation context, tool results from this run, intermediate calculations. Lives in the LLM's context window. Cleared between runs.
    • Long-term (persistent memory): User preferences, learned facts, previous task outcomes. Must be stored externally (vector DB, key-value store, relational DB).

    Treating both as the same thing causes bloated context windows (and costs — GPT-4o charges per token), stale data in new runs, and context-window-exceeded errors on long tasks.

    python
    # Python 3.11+ | LangChain 0.2+
    from langchain_community.vectorstores import Chroma
    from langchain_openai import OpenAIEmbeddings
    from langchain_core.messages import HumanMessage, AIMessage
    from typing import List
    
    class AgentMemory:
        def __init__(self, session_id: str):
            self.session_id = session_id
            # Short-term: in-process list, cleared per run
            self._working: List = []
            # Long-term: persisted to vector store
            self._long_term = Chroma(
                collection_name=f"agent_{session_id}",
                embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
                persist_directory="./chroma_db",
            )
    
        def add_working(self, message):
            """Add to short-term working memory (current run only)."""
            self._working.append(message)
            # Keep last 20 messages to avoid context overflow
            if len(self._working) > 20:
                self._working = self._working[-20:]
    
        def remember(self, fact: str, metadata: dict = None):
            """Persist important facts to long-term memory."""
            self._long_term.add_texts(
                texts=[fact],
                metadatas=[{**(metadata or {}), "session": self.session_id}],
            )
    
        def recall(self, query: str, k: int = 3) -> List[str]:
            """Retrieve relevant long-term memories for current context."""
            docs = self._long_term.similarity_search(query, k=k)
            return [doc.page_content for doc in docs]
    
        def clear_working(self):
            """Reset working memory between agent runs."""
            self._working = []

    Pattern 5 — Human-in-the-Loop Checkpoints

    Fully autonomous AI agents are rarely appropriate in production. For any action that is irreversible (sending an email, deleting data, making a purchase, deploying code), insert a human approval checkpoint.

    The pattern: classify actions by risk level. Low-risk actions (reading data, searching the web) proceed automatically. Medium-risk actions get logged for async review. High-risk actions block until a human explicitly approves.

    python
    # Python 3.11+
    from enum import Enum
    from dataclasses import dataclass
    
    class RiskLevel(Enum):
        LOW = "low"       # auto-proceed
        MEDIUM = "medium" # log + proceed, flag for review
        HIGH = "high"     # block until approved
    
    @dataclass
    class PendingAction:
        action_id: str
        tool_name: str
        tool_input: dict
        risk: RiskLevel
        approved: bool = False
    
    TOOL_RISK_MAP = {
        "web_search": RiskLevel.LOW,
        "read_file": RiskLevel.LOW,
        "query_db": RiskLevel.MEDIUM,
        "send_email": RiskLevel.HIGH,
        "delete_record": RiskLevel.HIGH,
        "deploy_code": RiskLevel.HIGH,
    }
    
    def should_proceed(action: PendingAction) -> bool:
        if action.risk == RiskLevel.LOW:
            return True
        if action.risk == RiskLevel.MEDIUM:
            log_for_review(action)
            return True
        # HIGH: block and wait for human approval
        notify_human(action)
        return wait_for_approval(action.action_id, timeout_seconds=300)

    This pattern alone has prevented more production incidents than any other change teams make when moving from prototype to production.

    ---

    5 Anti-Patterns That Will Break Your Agent

    Anti-Pattern 1 — Unbounded Loops

    An agent with no iteration limit will loop forever if the LLM keeps deciding "not done yet." This is the most common cause of runaway API costs.

    The fix: Always set hard limits — maximum iterations, maximum tokens consumed, and maximum wall-clock time. Check all three, not just one.

    python
    MAX_ITERATIONS = 15
    MAX_TOKENS = 50_000
    MAX_SECONDS = 120
    
    if state.iterations >= MAX_ITERATIONS:
        raise AgentTimeoutError(f"Exceeded {MAX_ITERATIONS} iterations")

    Anti-Pattern 2 — God Tool (Tool That Does Everything)

    A single tool that accepts arbitrary commands and routes them internally is a debugging nightmare. When the agent misuses it, you have no visibility into which sub-operation failed. The LLM also struggles to use it correctly — the more a tool does, the harder it is to describe in a way the LLM understands.

    The fix: One tool, one job. Ten small tools are better than one large one.

    Anti-Pattern 3 — Trusting LLM Output Blindly

    LLMs hallucinate. They return confident, well-formatted JSON with wrong values. They invent API parameter names. They claim success when a tool actually failed. If you pass raw LLM output directly to a database insert or API call, you will have a bad day.

    The fix: Validate every LLM output with Pydantic before acting on it (see Pattern 3). Treat LLM output like user input — never trusted by default.

    Anti-Pattern 4 — No Observability

    An agent run that fails silently is worse than one that fails loudly. Without structured logging of each step, tool call, and decision, debugging a production issue means guessing.

    The fix: Emit a structured log event for every agent action:

    python
    import structlog
    import time
    
    log = structlog.get_logger()
    
    def traced_tool_call(tool_name: str, inputs: dict, run_id: str) -> dict:
        start = time.monotonic()
        log.info("tool.call.start", tool=tool_name, inputs=inputs, run_id=run_id)
        try:
            result = TOOL_REGISTRY[tool_name](**inputs)
            elapsed_ms = (time.monotonic() - start) * 1000
            log.info(
                "tool.call.success",
                tool=tool_name,
                run_id=run_id,
                elapsed_ms=round(elapsed_ms, 1),
            )
            return result
        except Exception as e:
            elapsed_ms = (time.monotonic() - start) * 1000
            log.error(
                "tool.call.failure",
                tool=tool_name,
                run_id=run_id,
                error=str(e),
                elapsed_ms=round(elapsed_ms, 1),
            )
            raise

    Ship this from day one. Retroactively adding observability to a production agent is painful.

    Anti-Pattern 5 — Stateless Agents

    An agent that forgets everything between runs can't learn from mistakes, can't resume interrupted tasks, and forces users to repeat context every time. This isn't just a UX problem — it means the agent is structurally incapable of handling multi-session workflows.

    The fix: Persist state. At minimum, store the task goal, completed steps, and any retrieved facts in a database keyed by session ID. On resume, load this state before starting the agent loop.

    ---

    Production Architecture — A Real-World Example

    Here's what a production-grade AI agent architecture looks like when all five patterns are applied:

    Production Architecture Diagram

    flowchart TB subgraph Ingress U([User / API]) --> GW[API Gateway] GW --> RQ[Request Queue] end subgraph AgentRuntime["Agent Runtime"] RQ --> AL[Agent Loop] AL --> MM[Memory Manager] MM --> WM[(Working Memory\nin-process)] MM --> LTM[(Long-term Memory\nVector DB)] AL --> TD[Tool Dispatcher] TD --> CB{Circuit Breaker} CB --> T1[web_search] CB --> T2[query_db] CB --> T3[read_file] CB --> T4[send_email\nHITL required] end subgraph Validation AL --> OV[Output Validator\nPydantic] OV -- valid --> NXT[Next Action] OV -- invalid --> ERR[Error Handler\n+ Fallback] end subgraph Observability AL --> OBS[Structured Logger\nstructlog] OBS --> LS[(Log Store\nOpenSearch)] OBS --> MT[Metrics\nPrometheus] end subgraph HumanInTheLoop T4 --> HP[Human Approval\nWebhook] HP --> HU([Reviewer]) HU -- approve/reject --> HP HP --> T4 end AL --> RS([Result / Response])

    Code Example — Full Agent Run

    python
    # Python 3.11+ | LangChain 0.2+ | structlog 24.x
    import uuid
    import time
    import structlog
    from langchain_openai import ChatOpenAI
    from langchain_core.messages import HumanMessage, SystemMessage
    from langchain.agents import create_tool_calling_agent, AgentExecutor
    from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
    
    log = structlog.get_logger()
    
    MAX_ITERATIONS = 15
    MAX_SECONDS = 120
    
    def run_agent(goal: str, session_id: str | None = None) -> dict:
        run_id = str(uuid.uuid4())
        session_id = session_id or run_id
        start_time = time.monotonic()
    
        log.info("agent.run.start", run_id=run_id, session_id=session_id, goal=goal)
    
        memory = AgentMemory(session_id)
        prior_context = memory.recall(goal, k=3)
    
        tools = [web_search]  # register isolated, validated tools here
    
        llm = ChatOpenAI(model="gpt-4o", temperature=0, max_tokens=4096)
    
        prompt = ChatPromptTemplate.from_messages([
            ("system", (
                "You are a reliable research agent. Complete the user's goal using available tools. "
                "Stop when you have a confident answer. Never loop more than necessary.\n\n"
                "Prior context from memory:\n{prior_context}"
            )),
            ("human", "{input}"),
            MessagesPlaceholder(variable_name="agent_scratchpad"),
        ])
    
        agent = create_tool_calling_agent(llm, tools, prompt)
        executor = AgentExecutor(
            agent=agent,
            tools=tools,
            max_iterations=MAX_ITERATIONS,
            verbose=False,
            handle_parsing_errors=True,
        )
    
        try:
            result = executor.invoke({
                "input": goal,
                "prior_context": "\n".join(prior_context) or "None",
            })
    
            elapsed = (time.monotonic() - start_time)
            if elapsed > MAX_SECONDS:
                log.warning("agent.run.slow", run_id=run_id, elapsed_s=round(elapsed, 1))
    
            # Persist key findings to long-term memory
            if result.get("output"):
                memory.remember(f"Goal: {goal}\nAnswer: {result['output'][:500]}")
    
            log.info("agent.run.success", run_id=run_id, elapsed_s=round(elapsed, 1))
            return {"run_id": run_id, "output": result["output"], "success": True}
    
        except Exception as e:
            elapsed = (time.monotonic() - start_time)
            log.error("agent.run.failure", run_id=run_id, error=str(e), elapsed_s=round(elapsed, 1))
            return {"run_id": run_id, "output": None, "success": False, "error": str(e)}

    ---

    Trade-offs and When NOT to Use Agents

    AI agents are powerful, but they're not the right tool for every problem. Be honest about the costs:

    Latency: A single LLM call takes 1-5 seconds. An agent that makes 10 tool calls before returning an answer takes 10-50 seconds. For user-facing features with sub-second latency requirements, agents are wrong. A direct RAG pipeline or a single well-crafted prompt will be faster and cheaper.

    Cost: Each agent iteration burns tokens. A 10-step agent run using GPT-4o with 2,000 tokens per step costs roughly $0.60 per run. At 1,000 runs/day, that's $600/day — before tool API costs. Model these costs before committing to an agentic architecture.

    Unpredictability: Even with all the patterns above, agents are harder to test deterministically than regular code. Temperature=0 helps but doesn't guarantee identical outputs. If your business logic requires exact, reproducible behavior, agents introduce unnecessary risk.

    When agents are the right choice:

    • Tasks that genuinely require multi-step reasoning and dynamic tool selection
    • Workflows where the exact steps aren't known in advance
    • Research or analysis tasks where the agent can explore and adapt
    • Long-running background tasks where latency doesn't matter

    When to skip agents:

    • The workflow has fixed, predictable steps → use a pipeline or workflow engine
    • Latency is critical → use a single LLM call with a well-designed prompt
    • The task is classification or extraction → fine-tune a smaller model
    • You need 100% reproducibility → agents are wrong

    ---

    Frequently Asked Questions

    What's the difference between an AI agent and a chatbot?

    A chatbot responds to one message at a time and has no ability to take actions in external systems. An AI agent operates in a loop: it receives a goal, decides what actions to take, calls tools (APIs, databases, code execution), observes the results, and repeats until the goal is complete. Agents are stateful, multi-step, and action-oriented. Chatbots are stateless, single-turn, and response-oriented.

    How do I prevent infinite loops in AI agents?

    Three guards, applied together:

    1. Iteration limit: hard cap on the number of loop cycles (e.g., max_iterations=15)
    2. Token budget: track total tokens consumed and stop when over threshold
    3. Time limit: wall-clock timeout (e.g., 120 seconds) that kills the run regardless of progress

    Don't rely on any single guard. An agent can stay under the iteration limit but consume enormous tokens per step. All three together form a reliable safety net.

    What tools should an AI agent have access to?

    Follow the principle of least privilege: give the agent only the tools it needs for its specific domain, nothing more. A customer support agent doesn't need file system access. A research agent doesn't need the ability to send emails.

    For each tool, ask: "What's the blast radius if the LLM misuses this tool?" High blast-radius tools (delete, send, deploy) should require human-in-the-loop approval. Low blast-radius tools (read, search, query) can be auto-approved. Keep the total tool count under 10 — beyond that, LLM tool-selection accuracy measurably degrades.

    ---

    Further Reading

    Related Blog

    Building AI Agents: Patterns and Anti-Patterns (A Production Guide) | Tob