Building AI Agents: Patterns and Anti-Patterns (A Production Guide)
Most AI agents fail in production because of architecture mistakes, not LLM limitations. Here are the 5 patterns that work and the 5 anti-patterns that will break your system.
Tob
Backend Developer
TL;DR: Building AI agents that work in demos is easy. Building ones that survive production is hard. The difference isn't the LLM — it's the architecture around it. This guide covers the 5 patterns that make agents reliable and the 5 anti-patterns that will silently destroy your system.
---
What is an AI Agent? (And Why Most Fail in Production)
When people talk about building AI agents, they usually mean a system where an LLM doesn't just respond to a prompt — it takes actions, uses tools, makes decisions, and loops until a goal is reached. Unlike a chatbot that answers one question at a time, an autonomous AI agent can browse the web, query databases, write and run code, and chain dozens of steps together without hand-holding.
The concept sounds powerful. And it is — when it works.
The problem: most teams prototype an agent in an afternoon, ship it, and then spend the next month firefighting. The agent enters infinite loops. It spends $200 on OpenAI tokens trying to answer a question that should take 3 API calls. It silently returns wrong answers with complete confidence. It crashes with no log to explain what happened.
None of these failures come from the LLM being "not good enough." They come from architecture mistakes that are entirely avoidable. The patterns in this guide are drawn from real production deployments — systems handling thousands of agent runs per day — not just toy examples.
---
The Core Agent Loop — How It Actually Works
Observe → Decide → Act → Repeat
Every AI agent, regardless of framework or LLM, runs the same fundamental loop:
- Observe — gather the current state: user goal, prior results, available tools, memory
- Decide — the LLM picks the next action (call a tool, ask a clarifying question, return a result)
- Act — execute the chosen action (run the tool, write to memory, call an API)
- Repeat — feed the result back as a new observation, loop until done
The loop sounds simple. The complexity lives in what you do around it: how you validate inputs and outputs, how you handle failures, how you prevent runaway loops, and how you give humans visibility into what's happening.
The Agent Loop — Mermaid Diagram
Notice what's missing from most naive implementations: the error handling branch (J), the validity check (H), and the human-in-the-loop path (E → L). Those three omissions account for ~80% of production agent failures.
---
5 Proven Patterns for Reliable AI Agents
These patterns apply regardless of framework. The code examples use Python 3.11 and LangChain 0.2+, but the concepts translate to any stack.
Pattern 1 — Tool Isolation
Every tool your agent calls should be a pure, bounded function with:
- A single, well-defined responsibility
- Explicit input validation
- A predictable output schema
- Its own error handling
The mistake most teams make: one massive execute_task() tool that does database queries, file I/O, and API calls all in one function. When it fails, you have no idea where. When the LLM calls it wrong, the blast radius is huge.
# Python 3.11+ | LangChain 0.2+
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Optional
import httpx
class SearchInput(BaseModel):
query: str = Field(description="Search query, max 200 chars")
max_results: int = Field(default=5, ge=1, le=20)
@tool(args_schema=SearchInput)
def web_search(query: str, max_results: int = 5) -> dict:
"""Search the web and return structured results. Use for factual lookups only."""
if len(query) > 200:
raise ValueError(f"Query too long: {len(query)} chars (max 200)")
try:
response = httpx.get(
"https://api.search-provider.com/search",
params={"q": query, "limit": max_results},
timeout=10.0,
)
response.raise_for_status()
return {"results": response.json()["items"], "query": query}
except httpx.TimeoutException:
return {"error": "Search timed out after 10s", "results": []}
except httpx.HTTPStatusError as e:
return {"error": f"Search API returned {e.response.status_code}", "results": []}Notice: the tool returns a structured dict even on error. It never throws an unhandled exception back to the agent loop. The LLM can handle a {"error": "...", "results": []} gracefully; an unhandled exception crashes the entire run.
Pattern 2 — Circuit Breaker
An agent that calls a failing external service in a tight loop will rack up costs and latency fast. A circuit breaker stops the bleeding.
The pattern: track failure counts per tool. After N consecutive failures, "open" the circuit — refuse calls to that tool for a cooldown period. After the cooldown, let one call through to test if the service recovered.
# Python 3.11+
import time
from dataclasses import dataclass, field
from typing import Callable, Any
@dataclass
class CircuitBreaker:
failure_threshold: int = 3
cooldown_seconds: int = 60
_failures: int = field(default=0, init=False)
_opened_at: Optional[float] = field(default=None, init=False)
def is_open(self) -> bool:
if self._opened_at is None:
return False
elapsed = time.monotonic() - self._opened_at
if elapsed > self.cooldown_seconds:
# Half-open: allow one test call
self._opened_at = None
self._failures = 0
return False
return True
def record_success(self):
self._failures = 0
self._opened_at = None
def record_failure(self):
self._failures += 1
if self._failures >= self.failure_threshold:
self._opened_at = time.monotonic()
def call(self, fn: Callable, *args, **kwargs) -> Any:
if self.is_open():
raise RuntimeError(
f"Circuit open — service unavailable (retry after {self.cooldown_seconds}s)"
)
try:
result = fn(*args, **kwargs)
self.record_success()
return result
except Exception as e:
self.record_failure()
raise
# Usage
db_breaker = CircuitBreaker(failure_threshold=3, cooldown_seconds=30)
def query_database(sql: str) -> list:
return db_breaker.call(_raw_db_query, sql)In production at one team I know, adding circuit breakers to their agent's external tool calls reduced runaway-cost incidents by 90% in the first month.
Pattern 3 — Structured Output Validation
LLMs are probabilistic. Even with a perfect prompt, an LLM will occasionally return malformed JSON, skip required fields, or hallucinate values that violate your schema. Never let raw LLM output hit your business logic without validation.
Use Pydantic (v2) to define what you expect, and validate every LLM response before acting on it:
# Python 3.11+ | LangChain 0.2+ | Pydantic 2.x
from pydantic import BaseModel, field_validator
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
class AgentAction(BaseModel):
tool_name: str
tool_input: dict
reasoning: str
confidence: float
@field_validator("confidence")
@classmethod
def confidence_must_be_valid(cls, v: float) -> float:
if not 0.0 <= v <= 1.0:
raise ValueError(f"Confidence must be 0-1, got {v}")
return v
@field_validator("tool_name")
@classmethod
def tool_must_exist(cls, v: str) -> str:
allowed = {"web_search", "read_file", "write_file", "query_db"}
if v not in allowed:
raise ValueError(f"Unknown tool: {v}. Allowed: {allowed}")
return v
parser = PydanticOutputParser(pydantic_object=AgentAction)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "You are an agent. Respond with valid JSON.\n{format_instructions}"),
("human", "{task}"),
]).partial(format_instructions=parser.get_format_instructions())
chain = prompt | llm | parser
try:
action: AgentAction = chain.invoke({"task": "Find the latest Python version"})
except Exception as e:
# Validation failed — log it, don't blindly retry
print(f"LLM output invalid: {e}")
action = AgentAction(
tool_name="web_search",
tool_input={"query": "latest Python version"},
reasoning="Fallback after parse failure",
confidence=0.5,
)The fallback action on parse failure matters. Don't crash — degrade gracefully and log the failure for debugging.
Pattern 4 — Memory Separation (Short vs Long-term)
Agent memory comes in two distinct forms, and mixing them is a common source of bugs:
- Short-term (working memory): The current conversation context, tool results from this run, intermediate calculations. Lives in the LLM's context window. Cleared between runs.
- Long-term (persistent memory): User preferences, learned facts, previous task outcomes. Must be stored externally (vector DB, key-value store, relational DB).
Treating both as the same thing causes bloated context windows (and costs — GPT-4o charges per token), stale data in new runs, and context-window-exceeded errors on long tasks.
# Python 3.11+ | LangChain 0.2+
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.messages import HumanMessage, AIMessage
from typing import List
class AgentMemory:
def __init__(self, session_id: str):
self.session_id = session_id
# Short-term: in-process list, cleared per run
self._working: List = []
# Long-term: persisted to vector store
self._long_term = Chroma(
collection_name=f"agent_{session_id}",
embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
persist_directory="./chroma_db",
)
def add_working(self, message):
"""Add to short-term working memory (current run only)."""
self._working.append(message)
# Keep last 20 messages to avoid context overflow
if len(self._working) > 20:
self._working = self._working[-20:]
def remember(self, fact: str, metadata: dict = None):
"""Persist important facts to long-term memory."""
self._long_term.add_texts(
texts=[fact],
metadatas=[{**(metadata or {}), "session": self.session_id}],
)
def recall(self, query: str, k: int = 3) -> List[str]:
"""Retrieve relevant long-term memories for current context."""
docs = self._long_term.similarity_search(query, k=k)
return [doc.page_content for doc in docs]
def clear_working(self):
"""Reset working memory between agent runs."""
self._working = []Pattern 5 — Human-in-the-Loop Checkpoints
Fully autonomous AI agents are rarely appropriate in production. For any action that is irreversible (sending an email, deleting data, making a purchase, deploying code), insert a human approval checkpoint.
The pattern: classify actions by risk level. Low-risk actions (reading data, searching the web) proceed automatically. Medium-risk actions get logged for async review. High-risk actions block until a human explicitly approves.
# Python 3.11+
from enum import Enum
from dataclasses import dataclass
class RiskLevel(Enum):
LOW = "low" # auto-proceed
MEDIUM = "medium" # log + proceed, flag for review
HIGH = "high" # block until approved
@dataclass
class PendingAction:
action_id: str
tool_name: str
tool_input: dict
risk: RiskLevel
approved: bool = False
TOOL_RISK_MAP = {
"web_search": RiskLevel.LOW,
"read_file": RiskLevel.LOW,
"query_db": RiskLevel.MEDIUM,
"send_email": RiskLevel.HIGH,
"delete_record": RiskLevel.HIGH,
"deploy_code": RiskLevel.HIGH,
}
def should_proceed(action: PendingAction) -> bool:
if action.risk == RiskLevel.LOW:
return True
if action.risk == RiskLevel.MEDIUM:
log_for_review(action)
return True
# HIGH: block and wait for human approval
notify_human(action)
return wait_for_approval(action.action_id, timeout_seconds=300)This pattern alone has prevented more production incidents than any other change teams make when moving from prototype to production.
---
5 Anti-Patterns That Will Break Your Agent
Anti-Pattern 1 — Unbounded Loops
An agent with no iteration limit will loop forever if the LLM keeps deciding "not done yet." This is the most common cause of runaway API costs.
The fix: Always set hard limits — maximum iterations, maximum tokens consumed, and maximum wall-clock time. Check all three, not just one.
MAX_ITERATIONS = 15
MAX_TOKENS = 50_000
MAX_SECONDS = 120
if state.iterations >= MAX_ITERATIONS:
raise AgentTimeoutError(f"Exceeded {MAX_ITERATIONS} iterations")Anti-Pattern 2 — God Tool (Tool That Does Everything)
A single tool that accepts arbitrary commands and routes them internally is a debugging nightmare. When the agent misuses it, you have no visibility into which sub-operation failed. The LLM also struggles to use it correctly — the more a tool does, the harder it is to describe in a way the LLM understands.
The fix: One tool, one job. Ten small tools are better than one large one.
Anti-Pattern 3 — Trusting LLM Output Blindly
LLMs hallucinate. They return confident, well-formatted JSON with wrong values. They invent API parameter names. They claim success when a tool actually failed. If you pass raw LLM output directly to a database insert or API call, you will have a bad day.
The fix: Validate every LLM output with Pydantic before acting on it (see Pattern 3). Treat LLM output like user input — never trusted by default.
Anti-Pattern 4 — No Observability
An agent run that fails silently is worse than one that fails loudly. Without structured logging of each step, tool call, and decision, debugging a production issue means guessing.
The fix: Emit a structured log event for every agent action:
import structlog
import time
log = structlog.get_logger()
def traced_tool_call(tool_name: str, inputs: dict, run_id: str) -> dict:
start = time.monotonic()
log.info("tool.call.start", tool=tool_name, inputs=inputs, run_id=run_id)
try:
result = TOOL_REGISTRY[tool_name](**inputs)
elapsed_ms = (time.monotonic() - start) * 1000
log.info(
"tool.call.success",
tool=tool_name,
run_id=run_id,
elapsed_ms=round(elapsed_ms, 1),
)
return result
except Exception as e:
elapsed_ms = (time.monotonic() - start) * 1000
log.error(
"tool.call.failure",
tool=tool_name,
run_id=run_id,
error=str(e),
elapsed_ms=round(elapsed_ms, 1),
)
raiseShip this from day one. Retroactively adding observability to a production agent is painful.
Anti-Pattern 5 — Stateless Agents
An agent that forgets everything between runs can't learn from mistakes, can't resume interrupted tasks, and forces users to repeat context every time. This isn't just a UX problem — it means the agent is structurally incapable of handling multi-session workflows.
The fix: Persist state. At minimum, store the task goal, completed steps, and any retrieved facts in a database keyed by session ID. On resume, load this state before starting the agent loop.
---
Production Architecture — A Real-World Example
Here's what a production-grade AI agent architecture looks like when all five patterns are applied:
Production Architecture Diagram
Code Example — Full Agent Run
# Python 3.11+ | LangChain 0.2+ | structlog 24.x
import uuid
import time
import structlog
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
log = structlog.get_logger()
MAX_ITERATIONS = 15
MAX_SECONDS = 120
def run_agent(goal: str, session_id: str | None = None) -> dict:
run_id = str(uuid.uuid4())
session_id = session_id or run_id
start_time = time.monotonic()
log.info("agent.run.start", run_id=run_id, session_id=session_id, goal=goal)
memory = AgentMemory(session_id)
prior_context = memory.recall(goal, k=3)
tools = [web_search] # register isolated, validated tools here
llm = ChatOpenAI(model="gpt-4o", temperature=0, max_tokens=4096)
prompt = ChatPromptTemplate.from_messages([
("system", (
"You are a reliable research agent. Complete the user's goal using available tools. "
"Stop when you have a confident answer. Never loop more than necessary.\n\n"
"Prior context from memory:\n{prior_context}"
)),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(
agent=agent,
tools=tools,
max_iterations=MAX_ITERATIONS,
verbose=False,
handle_parsing_errors=True,
)
try:
result = executor.invoke({
"input": goal,
"prior_context": "\n".join(prior_context) or "None",
})
elapsed = (time.monotonic() - start_time)
if elapsed > MAX_SECONDS:
log.warning("agent.run.slow", run_id=run_id, elapsed_s=round(elapsed, 1))
# Persist key findings to long-term memory
if result.get("output"):
memory.remember(f"Goal: {goal}\nAnswer: {result['output'][:500]}")
log.info("agent.run.success", run_id=run_id, elapsed_s=round(elapsed, 1))
return {"run_id": run_id, "output": result["output"], "success": True}
except Exception as e:
elapsed = (time.monotonic() - start_time)
log.error("agent.run.failure", run_id=run_id, error=str(e), elapsed_s=round(elapsed, 1))
return {"run_id": run_id, "output": None, "success": False, "error": str(e)}---
Trade-offs and When NOT to Use Agents
AI agents are powerful, but they're not the right tool for every problem. Be honest about the costs:
Latency: A single LLM call takes 1-5 seconds. An agent that makes 10 tool calls before returning an answer takes 10-50 seconds. For user-facing features with sub-second latency requirements, agents are wrong. A direct RAG pipeline or a single well-crafted prompt will be faster and cheaper.
Cost: Each agent iteration burns tokens. A 10-step agent run using GPT-4o with 2,000 tokens per step costs roughly $0.60 per run. At 1,000 runs/day, that's $600/day — before tool API costs. Model these costs before committing to an agentic architecture.
Unpredictability: Even with all the patterns above, agents are harder to test deterministically than regular code. Temperature=0 helps but doesn't guarantee identical outputs. If your business logic requires exact, reproducible behavior, agents introduce unnecessary risk.
When agents are the right choice:
- Tasks that genuinely require multi-step reasoning and dynamic tool selection
- Workflows where the exact steps aren't known in advance
- Research or analysis tasks where the agent can explore and adapt
- Long-running background tasks where latency doesn't matter
When to skip agents:
- The workflow has fixed, predictable steps → use a pipeline or workflow engine
- Latency is critical → use a single LLM call with a well-designed prompt
- The task is classification or extraction → fine-tune a smaller model
- You need 100% reproducibility → agents are wrong
---
Frequently Asked Questions
What's the difference between an AI agent and a chatbot?
A chatbot responds to one message at a time and has no ability to take actions in external systems. An AI agent operates in a loop: it receives a goal, decides what actions to take, calls tools (APIs, databases, code execution), observes the results, and repeats until the goal is complete. Agents are stateful, multi-step, and action-oriented. Chatbots are stateless, single-turn, and response-oriented.
How do I prevent infinite loops in AI agents?
Three guards, applied together:
- Iteration limit: hard cap on the number of loop cycles (e.g.,
max_iterations=15) - Token budget: track total tokens consumed and stop when over threshold
- Time limit: wall-clock timeout (e.g., 120 seconds) that kills the run regardless of progress
Don't rely on any single guard. An agent can stay under the iteration limit but consume enormous tokens per step. All three together form a reliable safety net.
What tools should an AI agent have access to?
Follow the principle of least privilege: give the agent only the tools it needs for its specific domain, nothing more. A customer support agent doesn't need file system access. A research agent doesn't need the ability to send emails.
For each tool, ask: "What's the blast radius if the LLM misuses this tool?" High blast-radius tools (delete, send, deploy) should require human-in-the-loop approval. Low blast-radius tools (read, search, query) can be auto-approved. Keep the total tool count under 10 — beyond that, LLM tool-selection accuracy measurably degrades.
---
Further Reading
- LangChain AgentExecutor documentation — official reference for the agent patterns used in this article
- ReAct: Synergizing Reasoning and Acting in Language Models — the original paper behind most modern agent loops
- OpenAI Function Calling guide — structured tool calling at the API level
- Pydantic v2 docs — essential for structured output validation
- structlog documentation — structured logging for Python, used in the observability pattern above
- The Bitter Lesson (Richard Sutton) — useful context for why simpler, more general architectures tend to win
Related Blog
Agentic AI: The Complete Guide to Autonomous AI Systems in 2026
AI Engineering · 14 min read

AI Agent Design Patterns: Panduan Lengkap Memilih Arsitektur yang Tepat
AI Engineering · 18 min read
AI This Week: A Victorian Chatbot, a PyPI Supply Chain Attack, and Text Rendering Magic
AI Engineering · 5 min read