How do AI coding agents like Claude Code actually work?

They run an agent loop: the LLM reads context, decides which tool to call (read file, write file, run command), the agent executes it, returns the result, and the LLM decides the next step until the task is complete.

What is tool calling (function calling) in LLMs?

Tool calling lets an LLM output structured JSON requesting an action (like reading a file) instead of plain text. The host application executes the action and returns results to the LLM for the next step.

How does the context window affect AI coding agents?

The context window limits how much code the agent can 'see' at once. Agents use strategies like summarization, selective file reading, and context compression to work within this limit on large codebases.

What is the difference between Claude Code and Amazon Q Developer CLI?

Claude Code uses TypeScript with diff-based edits and Opus 4.6 by default. Amazon Q CLI is written in Rust with an explicit state machine architecture. Both use the same agent loop pattern but differ in tool design and permission models.

Can AI coding agents write entire applications from scratch?

They can scaffold projects and implement features, but work best on well-defined tasks with clear context. Complex architecture decisions, ambiguous requirements, and large-scale refactoring still need human guidance.

阅读中文版 →

How AI Coding Agents Actually Work: A Source Code Deep Dive

We traced the source code of Amazon Q CLI and Claude Code to understand how AI coding agents really work under the hood.

zhuermu · March 15, 2026 · 25 min

AI AgentCoding AgentLLMAmazon QClaude CodeSource Code

How AI Coding Agents Actually Work: A Source Code Deep Dive

中文版 / Chinese Version: 本文最初发表于微信公众号。阅读中文原文 →

Cursor, Claude Code, Amazon Q, Windsurf… By 2026, AI coding has become a fiercely competitive red ocean. But have you ever wondered: how do these tools actually work under the hood?

This article is based on the source code of two open-source projects — Amazon Q Developer CLI (implemented in Rust) and Claude Code (TypeScript + Python) — and breaks down the core architecture of AI coding agents from the ground up.

First Things First: AI Coding is Not a Chatbot

Many people assume AI coding tools are just ChatGPT with an IDE shell wrapped around it. That is fundamentally wrong.

A chatbot can only talk. An AI coding agent can act. The difference? Tool Use (function calling).

When you ask Claude Code to fix a bug, here is what actually happens behind the scenes:

You:        "Fix the bug in src/app.js"
LLM thinks: I need to read that file first
LLM output: tool_call → fs_read("src/app.js")
Agent:      executes read → returns file content to LLM
LLM thinks: Found it — line 42 has the issue
LLM output: tool_call → fs_write("src/app.js", ...)
Agent:      executes write → returns result
LLM:        "Done. The problem was..."

This process is called the Agent Loop — the single most important design pattern in every AI coding tool.

Key concept: The LLM never directly operates on your machine. It sends structured JSON requests telling the agent “what I want to do,” and the agent validates permissions before executing on its behalf. This makes every operation auditable, interceptable, and reversible.

The Agent Loop: An Elegant State Machine

Amazon Q CLI implements an explicit finite state machine in Rust to manage the entire agent loop.

The FSM has six states: Idle → ExecutingRequest → ExecutingHooks → WaitingForApproval → ExecutingTools → (loop back)

Several aspects of this state machine are particularly well-designed:

The loop is automatic — after the LLM calls a tool, the result is injected back into the conversation and the LLM is called again automatically.
Tools can execute in parallel — the LLM can return multiple tool_use blocks in a single response, and the agent uses Tokio’s FuturesUnordered to execute them concurrently.
Every tool call passes through a permission check — no exceptions.

The Tool System: The AI’s Hands and Feet

The capability ceiling of an AI coding agent is entirely determined by what tools it has access to.

The Art of Tool Descriptions

A tool’s description is not documentation written for humans — it is a behavioral instruction written for the LLM. Its quality directly determines how well the agent performs.

`__tool_use_purpose`: Forcing the AI to Think Before It Acts

Amazon Q CLI has an elegant design detail: every tool call forces the LLM to fill in a purpose field. This seemingly small constraint has a significant effect — it compels the model to articulate why it is invoking a tool before it does so, reducing hallucinated or unnecessary tool calls.

The Security Model: Four Layers of Defense in Depth

AI coding agents have direct access to your filesystem, terminal, and network. Security is not optional — it is architectural.

Amazon Q CLI implements a four-layer security architecture: Hook → User Confirmation → Path Permissions → Tool Allowlist

All file paths go through canonicalize normalization first, which prevents ../-style path traversal attacks from bypassing permission boundaries.

Claude Code takes a different approach, using a Hook system to implement declarative security policies that automatically detect nine categories of common security risks.

Context Window Management: The Scarcest Resource

In traditional software, the bottlenecks are CPU, memory, and I/O. In AI coding, the bottleneck is the context window.

Every token matters. The system prompt, conversation history, tool descriptions, file contents, and tool results all compete for space in a fixed-size window. Blow past the limit and the model either loses critical context or the request fails entirely.

Amazon Q CLI employs four strategies to manage this:

Auto-compaction — Compresses 200K tokens of conversation into a ~2K token summary when the window fills up.
Message truncation — When reading large files, only the first 10,000 characters are retained.
History pruning — Keeps the most recent messages while maintaining structural integrity (ensuring tool call/result pairs stay matched).
Resource file limits — Automatically included resource files (like .qdeveloper configs) are capped at 10KB.

MCP Protocol: The De Facto Standard for Tool Extension

Both projects have adopted MCP (Model Context Protocol) as their tool extension protocol. MCP provides a standardized way to add new tools to an agent without modifying its core code — tools are defined as external servers that communicate over a well-defined protocol.

Amazon Q CLI manages multiple MCP servers using an Actor model, where each MCP server runs as an independent actor with its own lifecycle. This provides natural isolation: if one MCP server crashes, it does not bring down the others.

The Plugin Architecture: Claude Code’s Five-Dimensional Extension Model

Claude Code defines five orthogonal extension points, giving plugin authors fine-grained control over different aspects of the agent’s behavior.

The most interesting example is the feature-dev plugin’s seven-stage workflow — it launches multiple sub-agents in parallel to explore different code paths simultaneously. Each sub-agent operates on its own branch of exploration, and the results are synthesized back into a coherent implementation plan. This is a genuine multi-agent pattern, not just sequential tool calls.

Architecture Comparison

Dimension	Amazon Q CLI	Claude Code
Language	Rust	TypeScript + Python
Architecture	Monolithic agent + Actor concurrency	Core engine + plugin ecosystem
Concurrency	Tokio async + Actor model	Node.js event loop + child processes
Extension	MCP + Hook scripts	Five-dimensional plugin system
State	SQLite persistence	Filesystem + session state
Strength	Performance, type safety	Developer velocity, rich ecosystem

Despite these surface-level differences, the core paradigm is identical: LLM Agent + Tool Use + Streaming + Safety + MCP.

This convergence is not coincidental. These are the design patterns that survive contact with real-world coding tasks. Every serious AI coding tool, regardless of implementation language, ends up solving the same fundamental problems in remarkably similar ways.

Seven Design Principles

After studying both codebases in depth, here are the seven principles that define how modern AI coding agents are built:

The LLM is the brain; tools are the hands and feet. The model reasons and decides; tools execute. Never confuse the two.
Streaming-first. Every interaction streams tokens and events in real time. Batch request-response is a non-starter for developer experience.
Security is an architectural concern, not a feature. Permission checks, path validation, and user confirmation are baked into the agent loop itself — not bolted on after the fact.
The context window is the scarcest resource. Every design decision — from tool result formatting to conversation history management — must be context-budget-aware.
Tool descriptions are prompts. The text in a tool’s description field is effectively part of the system prompt. Write it with the same care.
MCP standardizes the tool ecosystem. A shared protocol means tools are portable across agents, just as REST standardized web APIs.
State machines drive conversation management. Explicit state transitions make the agent loop predictable, debuggable, and safe.

Final Thoughts

AI coding looks like magic. But crack it open, and the essence is surprisingly straightforward: one loop, a set of tools, and a permission system.

The loop calls the LLM, the LLM picks a tool, the tool executes, the result feeds back into the loop. Rinse and repeat until the task is done. Every piece of complexity — streaming, security, context management, plugin systems — is an elaboration on this core cycle.

Understanding these internals does not just satisfy curiosity. It gives you a framework for evaluating which tools to adopt, how to extend them, and where the real technical moats lie (hint: it is less about the loop and more about the tools and the prompts).

Reference projects:

Amazon Q Developer CLI (Rust)
Claude Code (TypeScript + Python)

References

Claude Code repository — GitHub
Claude Code documentation — Anthropic
Anthropic engineering blog — Anthropic
Model Context Protocol — MCP Documentation