Claude Agent SDK: Build Custom AI Agents
Build production Claude agents in TypeScript or Python with the official Agent SDK. Tool-use loop, MCP integration, extended thinking, guardrails, and observability — end-to-end tutorial in under 45 minutes.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

What the Claude Agent SDK Actually Is
The Claude Agent SDK is Anthropic's official framework for building production agents on top of the Claude API. It's not magic — under the hood it's a well-tuned loop that calls Claude, executes tools, feeds results back in, and repeats until the model stops asking for more work. What you get for free is the stuff that eats two weeks when you roll your own: streaming, tool-use choreography, extended thinking, context compaction, MCP client support, and permission gating. I've run three production agents on the TypeScript SDK since the December 2025 1.0 release — one customer support bot, one on-call triage helper, one internal analytics agent — and the SDK has saved me about 900 lines of loop glue per project.
This tutorial walks you end-to-end: install the TypeScript and Python SDKs, wire up the first agent in ~100 lines, add custom tools, plug in an MCP server, ship guardrails and observability, and decide when the managed Claude Agent service beats a self-hosted loop. The edge cases I've hit — runaway thinking budgets, tool-loop deadlocks, cost-overrun alerts — I send to the newsletter rather than bloat this piece.
Last updated: April 2026 — verified @anthropic-ai/claude-agent-sdk 1.4 (TypeScript) and claude-agent-sdk 0.9 (Python), confirmed pricing against Anthropic's pricing page, tested MCP integration against SDK 1.19.
Claude Agent SDK vs Raw API: What You Get
Definition: The Claude Agent SDK is a first-party TypeScript and Python wrapper around the Messages API that manages the agentic loop — calling Claude, executing tool handlers you register, streaming responses, and injecting Model Context Protocol tools. Replaces 400-900 lines of hand-rolled loop code per project.
If you've built an agent from scratch on the raw Messages API, you know the pattern: while (response.stop_reason === 'tool_use') { ... }, with ceremony around streaming deltas, tool-result formatting, retry logic, and context trimming. The SDK collapses all of that into a single runAgent (TypeScript) or ClaudeAgent (Python) call.
Concretely, here's what the SDK gives you vs writing your own loop with the raw SDK:
| Concern | Raw API | Agent SDK |
|---|---|---|
| Tool-call loop | Hand-rolled while loop, stop-reason checks | Built-in, just register tool handlers |
| Streaming deltas | Parse SSE events, reassemble content blocks | Typed callbacks per event type |
| Extended thinking | Toggle thinking block, manage budget | One flag, auto-budget per turn |
| MCP client | Not available — you'd write one | First-class, attach servers by URL/stdio |
| Context compaction | You write the summarizer | Built-in on context-window pressure |
| Permission prompts | Custom UX layer | Callback hook per tool invocation |
| Cost accounting | Sum tokens manually across turns | Per-turn + cumulative emitted |
The cost of adopting it is thin: one extra package, and the SDK is itself maintained by Anthropic — same release cadence as the API. If you're already on the raw API, migration is 50 lines of diff. For a broader look at how this SDK sits among competitors, AI agent frameworks compared puts LangGraph, CrewAI, and the Claude Agent SDK side-by-side.
Step 1: Install and Authenticate
Install the SDK for your language. TypeScript requires Node 20+; Python requires 3.10+. Both packages ship typed interfaces and match the same mental model — if you can read one, you can read the other.
# TypeScript
npm install @anthropic-ai/claude-agent-sdk zod
# Python
pip install claude-agent-sdk pydantic
Get an API key from console.anthropic.com and export it. The SDK reads ANTHROPIC_API_KEY from the environment; never hardcode. I put it in a .env loaded via dotenv in development and fetch it from a secret manager in production.
export ANTHROPIC_API_KEY=sk-ant-...
Watch out: the SDK auto-retries on 429 with jittered backoff, but it does NOT cap spend. Set a
maxTurnsandmaxBudgetUsdon every agent — I've seen a runaway loop bill $180 in forty minutes because a regex-broken tool kept returning "retry" until the model gave up.
Step 2: Scaffold a Customer-Support Agent in 100 Lines
Here's the smallest useful agent — a customer support bot that can look up orders, search the knowledge base, and escalate to a human. We start with the TypeScript version because most teams I've shipped with pick it for production; the Python version is functionally identical and shown later.
// src/support-agent.ts
import { Agent, tool } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
const lookupOrder = tool({
name: "lookup_order",
description: "Fetch order status by ID. Returns shipping and item details.",
input: z.object({ order_id: z.string() }),
async execute({ order_id }) {
const order = await db.orders.findUnique({ where: { id: order_id } });
if (!order) return { error: "Order not found" };
return {
id: order.id,
status: order.status,
carrier: order.carrier,
tracking: order.trackingNumber,
items: order.items,
};
},
});
const searchKb = tool({
name: "search_kb",
description: "Search the help center for an article matching the query.",
input: z.object({ query: z.string(), top_k: z.number().default(3) }),
async execute({ query, top_k }) {
return await vectorStore.query(query, top_k);
},
});
const escalate = tool({
name: "escalate_to_human",
description: "Create a Zendesk ticket and notify on-call.",
input: z.object({
reason: z.string(),
priority: z.enum(["low", "normal", "high", "urgent"]),
conversation_summary: z.string(),
}),
async execute(input) {
const ticket = await zendesk.tickets.create(input);
return { ticket_id: ticket.id, url: ticket.url };
},
});
const agent = new Agent({
model: "claude-opus-4-7",
system: "You are a customer support agent for Acme Corp. " +
"Answer questions using the knowledge base first. " +
"Only escalate when a refund > $500 or account deletion is requested.",
tools: [lookupOrder, searchKb, escalate],
maxTurns: 20,
maxBudgetUsd: 2.0,
thinking: { budgetTokens: 2000 },
});
const result = await agent.run({
user: "My order A-2381 hasn't shipped, it's been 10 days.",
});
console.log(result.finalMessage);
console.log("Cost:", result.usage.totalCostUsd);
That's 60 lines plus blank space. The Agent class handles the entire tool-call loop — no while branch, no manual stop-reason checks. The TypeScript SDK on GitHub has the full type surface if you want to explore options like onToolStart, onThinking, and onMessageDelta hooks.
The Python equivalent is 1:1:
from claude_agent_sdk import Agent, tool
from pydantic import BaseModel
class OrderInput(BaseModel):
order_id: str
@tool(description="Fetch order status by ID.")
async def lookup_order(input: OrderInput) -> dict:
order = await db.orders.find_unique(where={"id": input.order_id})
return order.dict() if order else {"error": "Order not found"}
agent = Agent(
model="claude-opus-4-7",
system="You are a customer support agent for Acme Corp...",
tools=[lookup_order, search_kb, escalate],
max_turns=20,
max_budget_usd=2.0,
thinking={"budget_tokens": 2000},
)
result = await agent.run(user="My order A-2381 hasn't shipped...")
print(result.final_message)
print("Cost:", result.usage.total_cost_usd)
Reference: claude-agent-sdk-python on GitHub.
Step 3: Extend With Custom Tools (Search, DB, Actions)
Tools are the agent's only way to interact with your systems. The SDK enforces three things: (1) Zod/Pydantic schema on input, (2) an async execute function that returns JSON-serializable output, (3) a clear natural-language description the model reads to decide when to call.
Three categories of tool, in the order you'll typically add them:
- Read tools (
lookup_order,search_kb): safe to call any time, no side effects. Allow freely. - Write tools (
create_ticket,update_order): side effects. Gate behind a permission hook (Step 6). - Destructive tools (
delete_user,refund_order): require explicit human approval on each call. Never auto-allow.
Description text matters more than people think. The model picks tools from descriptions, not function names. Write them in the imperative: "Fetch order status by ID" beats "Order fetcher". Include units, limits, and return shape if they're non-obvious. I've seen a model call the wrong tool three times in a row because the description said "Get customer" instead of "Get customer profile by email (not by ID)".
Pro tip: cap tool output at ~5KB. The model doesn't read 40KB of database rows — it hits context pressure, the SDK auto-compacts, and you lose the tool result you actually needed. Paginate or summarize in the tool itself.
Step 4: Multi-Turn With Extended Thinking
Claude Opus 4.7's extended thinking mode is the single biggest quality lever the SDK exposes. With thinking: { budgetTokens: 2000 }, the model produces an internal reasoning block before each tool call — which dramatically improves tool selection on ambiguous queries. Turn it on once and forget it.
For multi-turn conversations (a chat UI, not a one-shot call), the SDK preserves state across run() invocations on the same agent instance:
const agent = new Agent({ /* ...config */ });
// Turn 1
let result = await agent.run({
user: "My order A-2381 hasn't shipped.",
});
// Turn 2 — agent remembers context
result = await agent.run({
user: "Can you refund it?",
});
// Persist across process restarts
const state = agent.exportState();
await kv.set(`agent:${conversationId}`, JSON.stringify(state));
// Later, rehydrate
const restored = await kv.get(`agent:${conversationId}`);
agent.importState(JSON.parse(restored));
The serialized state is JSON-safe and stable across SDK minor versions. For a chat service, I store it in Redis keyed by conversation ID with a 30-day TTL. For a deeper look at conversation-state patterns in retrieval-heavy agents, RAG explained covers the adjacent question of when to put documents in the context vs a vector store.
Step 5: Pricing and Cost Control (2026 Rates)
The SDK meters every call. Here's what you'll actually pay, per 1M tokens as of April 2026:
| Model | Input | Output | Thinking | Cache read |
|---|---|---|---|---|
| Claude Opus 4.7 | $5 | $25 | Billed as output | $0.50 |
| Claude Sonnet 4.6 | $3 | $15 | Billed as output | $0.30 |
| Claude Haiku 4.6 | $0.80 | $4 | Billed as output | $0.08 |
Three levers move your bill the most:
- Prompt caching (90% discount on cache reads) — enable it on the system prompt. System prompts rarely change; every subsequent turn reads from cache at one-tenth the cost.
- Batch API (50% discount) — for async agents (nightly report generation, offline analysis), route through batch. The SDK has a
batchModeflag. - Model tier — Haiku is 6x cheaper than Opus and handles 80% of single-turn agent work. Use Opus only when extended thinking is actually worth the ticket.
Setting maxBudgetUsd is non-negotiable. The SDK emits usage per turn and aborts if cumulative cost exceeds the cap. I default to $0.50 for customer-facing agents and $2.00 for internal tools. For broader cost math across providers, AI coding assistants compared has vendor-level pricing tables.
Step 6: Integrate MCP Servers
Model Context Protocol is the standard way to give agents reusable, versioned toolsets. The SDK is a first-class MCP client — you attach an MCP server by URL (for HTTP transport) or command (for stdio), and all its tools become available to the agent as if you'd registered them locally.
import { Agent, McpServer } from "@anthropic-ai/claude-agent-sdk";
const agent = new Agent({
model: "claude-opus-4-7",
system: "You are an internal ops agent...",
tools: [...localTools],
mcpServers: [
McpServer.stdio({
command: "npx",
args: ["-y", "mcp-github-issues"],
env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN },
}),
McpServer.http({
url: "https://mcp.example.com/sse",
headers: { Authorization: `Bearer ${process.env.MCP_TOKEN}` },
}),
],
});
The SDK discovers each server's tools, resources, and prompts on startup, and routes calls transparently. If you haven't built an MCP server yet, build your first MCP server in TypeScript walks through the npm-publishable pattern this snippet consumes.
Pro tip: the curated list of MCP servers covers the production-ready Postgres, Slack, Linear, and Sentry servers. Composing 3-4 existing MCP servers inside one agent beats writing a monolith every time.
Step 7: Production Guardrails
Dev-mode agents behave. Production agents get asked to transfer money, delete data, and refund angry customers. Three guardrail layers matter:
- Permission callbacks — gate each write tool on a
onToolStarthook. Human-in-the-loop for destructive ops. - Output validation — sanitize model output before rendering. Refuse PII leaks, URLs to untrusted domains, and shell command suggestions.
- Rate limits — per-user and per-conversation caps. An agent is a rate-limit amplifier: one user message can fan out to 10+ tool calls.
const agent = new Agent({
// ...config
onToolStart: async (tool, input) => {
if (destructiveTools.includes(tool.name)) {
const approved = await requestHumanApproval({
tool: tool.name,
input,
conversationId,
});
if (!approved) {
return { block: true, message: "User declined the action." };
}
}
await auditLog.append({ tool: tool.name, input, ts: Date.now() });
return { block: false };
},
onMessageDelta: (delta) => {
if (containsPii(delta.text)) {
throw new Error("PII detected in output");
}
},
});
The onToolStart hook runs before every tool invocation and can veto by returning { block: true }. For an ops agent, I log every tool call to a Postgres table with conversation ID, user, tool name, input JSON, and output — which both satisfies audit requirements and is the single most useful debugging artifact when an agent does something strange.
Step 8: Observability and Cost Tracking
The SDK emits OpenTelemetry-compatible spans if you configure an exporter. Every agent run becomes a trace with child spans per turn and per tool call, annotated with token counts, cost, model, and tool-call arguments.
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: "http://otel-collector:4318/v1/traces" }),
}).start();
// Agent runs now auto-instrument
const agent = new Agent({
// ...config
telemetry: { enabled: true, serviceName: "support-agent" },
});
Ship traces to Honeycomb, Datadog, or your own Tempo + Grafana. Without observability, you're blind: a single-digit percent of user conversations produce 80% of the cost, and you can't find them without trace data. I pair this with a small agent_runs Postgres table for long-term cost queries — OpenTelemetry is for debugging, SQL is for finance review.
Step 9: Managed Claude Agent Service vs Self-Hosted
Anthropic also offers a fully-managed Claude Agent service where they run the loop and you just POST messages. When does that beat self-hosting the SDK?
- Pick managed if: you have zero backend infra, your tools are all MCP-based, and you need < 2 weeks to ship. The managed service handles scaling, state, retries, and observability.
- Pick self-hosted SDK if: your tools touch internal VPC resources (private DB, Kafka, S3), you need sub-100ms tool latency, or your compliance team says "the data can't leave our account". Also if cost matters at scale — self-hosted pays just the Claude API, managed adds a per-agent-run platform fee.
I've run both. Managed is excellent for a v1 demo; by month three of production most teams migrate to self-hosted SDK for cost and control.
Step 10: Ship the Agent
Final checklist before you put an agent in front of users:
maxTurns≤ 20,maxBudgetUsdset, timeouts on every tool (5s read, 15s write)- Prompt caching enabled on the system prompt
- All destructive tools gated on
onToolStart - Per-user rate limit (10 msgs/min is a reasonable starting point)
- Audit logging to Postgres with tool calls and inputs
- OpenTelemetry traces shipping to your observability stack
- Runbook: what to do when the agent misbehaves (kill switch, conversation export)
- Evals before deploy — at minimum a 50-example golden-prompt set, run on every release
Ship behind a feature flag on 1% of traffic, watch traces for a day, ramp to 10%, then 100%. Agent behavior is surprisingly workload-sensitive: a prompt that works on synthetic examples can fail under the messy real-world distribution. I've rolled back three agents in the first 48 hours — each fix took under an hour, but only because we caught it with observability at 1% traffic.
Frequently Asked Questions
What is the Claude Agent SDK?
The Claude Agent SDK is Anthropic's official TypeScript and Python framework for building agents on top of the Claude API. It manages the tool-use loop, streaming, extended thinking, MCP client integration, context compaction, and cost accounting — replacing ~500 lines of hand-rolled code per project with a single Agent class.
Is the Claude Agent SDK free?
The SDK itself is open-source and free. You pay for the underlying Claude API calls at standard rates — $5/$25 per 1M input/output tokens for Opus 4.7, $0.80/$4 for Haiku 4.6. Prompt caching drops input costs by 90% on cache reads and the Batch API cuts another 50% for async workloads.
Claude Agent SDK vs LangChain: which should I use?
Pick the Claude Agent SDK if you're committed to Claude as your model and want first-party features (extended thinking, MCP, prompt caching, Anthropic support). Pick LangChain or LangGraph if you need multi-provider portability (OpenAI, Gemini, Claude in the same agent) or the broader ecosystem of prebuilt chains. The Agent SDK is smaller, faster, and better-typed for Claude-only deployments.
Does Claude Agent SDK support MCP?
Yes — MCP is first-class. Attach MCP servers via McpServer.stdio() or McpServer.http() in the agent config, and all server-exposed tools, resources, and prompts become available to the model automatically. The SDK handles protocol negotiation, transport, and message framing.
What languages does Claude Agent SDK support?
Official SDKs exist for TypeScript (Node 20+) and Python (3.10+) as of April 2026. Both ship identical feature sets — tools, MCP, extended thinking, streaming, telemetry. Community ports exist for Go and Rust but they lag the official SDKs by 1-2 releases.
How do I handle cost overruns with Claude Agent SDK?
Always set maxTurns and maxBudgetUsd on every agent. The SDK tracks cumulative cost per run and aborts when either limit is hit. Combine with per-user rate limits (10 msgs/min) and OpenTelemetry traces to catch runaway conversations before they dominate the bill. Prompt caching on the system prompt is the highest-impact cost win.
Can I use Claude Agent SDK for production?
Yes. The SDK is 1.0+ as of December 2025 and backs production agents at multiple Anthropic customers. Add guardrails (permission callbacks on destructive tools, output validation, audit logging), eval before each deploy, and roll out behind a feature flag on 1% of traffic first. The self-hosted SDK is preferred over the managed Claude Agent service for most production workloads at scale.
The Claude Agent SDK Is the Default Choice on Claude
Six months in, I've stopped writing hand-rolled agent loops. The Claude Agent SDK is a thin, opinionated wrapper that gets out of the way for the 80% case and exposes every hook you need for the other 20%. Install it, wire three tools, set a budget cap, ship behind a flag. The SDK will keep up with model releases (Opus 4.7 was supported at launch, so will the next tier), and the MCP integration means the tool ecosystem compounds with every new server published.
If you're already on the raw Messages API, migrating is a one-afternoon job and cuts your surface area by a third. If you're starting fresh, skip the DIY loop entirely — the managed-then-self-hosted trajectory is the right default unless you know exactly why you need otherwise.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Self-Hosting LLMs from India: Providers, Latency & INR Pricing (2026)
A practical comparison of self-hosting LLMs on Indian GPU clouds including E2E Networks, Tata TIR, and Yotta Shakti Cloud, with INR pricing inclusive of 18% GST, latency tests from Mumbai, Bangalore, Chennai, and Delhi, and DPDP Act 2023 compliance notes.
15 min read
DatabasesSQLite at the Edge: When libSQL Beats Postgres
SQLite at the edge via libSQL embedded replicas and Cloudflare D1 delivers 2-5ms reads worldwide versus 20-100ms for Postgres read replicas. Real benchmarks, pricing comparisons, production failure modes, and a decision framework for when edge SQLite wins and when Postgres-with-replicas is still the right call.
15 min read
AI/ML EngineeringQwen 3 vs Qwen 3.5: What Changed & Should You Upgrade
Qwen 3.5 wins on long context, code, and agentic math (AIME +25.8 at 72B) — but the 72B license shifted from Apache 2.0 to a community license and LoRA adapters do not port. Full architecture, benchmark, and migration breakdown.
15 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.