How AI Coding Agents Actually Work: A Source Code Deep Dive
From LLM reasoning and tool calling to context window management and security models — a comprehensive look at the internals of AI coding agents, based on Amazon Q CLI and Claude Code source code.
中文版 / Chinese Version: 本文最初发表于微信公众号。阅读中文原文 →
Cursor, Claude Code, Amazon Q, Windsurf… By 2026, AI coding has become a fiercely competitive red ocean. But have you ever wondered: how do these tools actually work under the hood?
This article is based on the source code of two open-source projects — Amazon Q Developer CLI (implemented in Rust) and Claude Code (TypeScript + Python) — and breaks down the core architecture of AI coding agents from the ground up.
1. First Things First: AI Coding is Not a Chatbot
Many people assume AI coding tools are just ChatGPT with an IDE shell wrapped around it. That is fundamentally wrong.
A chatbot can only talk. An AI coding agent can act. The difference? Tool Use (function calling).
When you ask Claude Code to fix a bug, here is what actually happens behind the scenes:
You: "Fix the bug in src/app.js"
LLM thinks: I need to read that file first
LLM output: tool_call → fs_read("src/app.js")
Agent: executes read → returns file content to LLM
LLM thinks: Found it — line 42 has the issue
LLM output: tool_call → fs_write("src/app.js", ...)
Agent: executes write → returns result
LLM: "Done. The problem was..."
This process is called the Agent Loop — the single most important design pattern in every AI coding tool.
Key concept: The LLM never directly operates on your machine. It sends structured JSON requests telling the agent “what I want to do,” and the agent validates permissions before executing on its behalf. This makes every operation auditable, interceptable, and reversible.
2. The Agent Loop: An Elegant State Machine
Amazon Q CLI implements an explicit finite state machine in Rust to manage the entire agent loop.
The FSM has six states: Idle → ExecutingRequest → ExecutingHooks → WaitingForApproval → ExecutingTools → (loop back)
Several aspects of this state machine are particularly well-designed:
- The loop is automatic — after the LLM calls a tool, the result is injected back into the conversation and the LLM is called again automatically.
- Tools can execute in parallel — the LLM can return multiple
tool_useblocks in a single response, and the agent uses Tokio’sFuturesUnorderedto execute them concurrently. - Every tool call passes through a permission check — no exceptions.
3. The Tool System: The AI’s Hands and Feet
The capability ceiling of an AI coding agent is entirely determined by what tools it has access to.
The Art of Tool Descriptions
A tool’s description is not documentation written for humans — it is a behavioral instruction written for the LLM. Its quality directly determines how well the agent performs.
__tool_use_purpose: Forcing the AI to Think Before It Acts
Amazon Q CLI has an elegant design detail: every tool call forces the LLM to fill in a purpose field. This seemingly small constraint has a significant effect — it compels the model to articulate why it is invoking a tool before it does so, reducing hallucinated or unnecessary tool calls.
4. The Security Model: Four Layers of Defense in Depth
AI coding agents have direct access to your filesystem, terminal, and network. Security is not optional — it is architectural.
Amazon Q CLI implements a four-layer security architecture: Hook → User Confirmation → Path Permissions → Tool Allowlist
All file paths go through canonicalize normalization first, which prevents ../-style path traversal attacks from bypassing permission boundaries.
Claude Code takes a different approach, using a Hook system to implement declarative security policies that automatically detect nine categories of common security risks.
5. Context Window Management: The Scarcest Resource
In traditional software, the bottlenecks are CPU, memory, and I/O. In AI coding, the bottleneck is the context window.
Every token matters. The system prompt, conversation history, tool descriptions, file contents, and tool results all compete for space in a fixed-size window. Blow past the limit and the model either loses critical context or the request fails entirely.
Amazon Q CLI employs four strategies to manage this:
- Auto-compaction — Compresses 200K tokens of conversation into a ~2K token summary when the window fills up.
- Message truncation — When reading large files, only the first 10,000 characters are retained.
- History pruning — Keeps the most recent messages while maintaining structural integrity (ensuring tool call/result pairs stay matched).
- Resource file limits — Automatically included resource files (like
.qdeveloperconfigs) are capped at 10KB.
6. MCP Protocol: The De Facto Standard for Tool Extension
Both projects have adopted MCP (Model Context Protocol) as their tool extension protocol. MCP provides a standardized way to add new tools to an agent without modifying its core code — tools are defined as external servers that communicate over a well-defined protocol.
Amazon Q CLI manages multiple MCP servers using an Actor model, where each MCP server runs as an independent actor with its own lifecycle. This provides natural isolation: if one MCP server crashes, it does not bring down the others.
7. The Plugin Architecture: Claude Code’s Five-Dimensional Extension Model
Claude Code defines five orthogonal extension points, giving plugin authors fine-grained control over different aspects of the agent’s behavior.
The most interesting example is the feature-dev plugin’s seven-stage workflow — it launches multiple sub-agents in parallel to explore different code paths simultaneously. Each sub-agent operates on its own branch of exploration, and the results are synthesized back into a coherent implementation plan. This is a genuine multi-agent pattern, not just sequential tool calls.
8. Architecture Comparison
| Dimension | Amazon Q CLI | Claude Code |
|---|---|---|
| Language | Rust | TypeScript + Python |
| Architecture | Monolithic agent + Actor concurrency | Core engine + plugin ecosystem |
| Concurrency | Tokio async + Actor model | Node.js event loop + child processes |
| Extension | MCP + Hook scripts | Five-dimensional plugin system |
| State | SQLite persistence | Filesystem + session state |
| Strength | Performance, type safety | Developer velocity, rich ecosystem |
Despite these surface-level differences, the core paradigm is identical: LLM Agent + Tool Use + Streaming + Safety + MCP.
This convergence is not coincidental. These are the design patterns that survive contact with real-world coding tasks. Every serious AI coding tool, regardless of implementation language, ends up solving the same fundamental problems in remarkably similar ways.
9. Seven Design Principles
After studying both codebases in depth, here are the seven principles that define how modern AI coding agents are built:
- The LLM is the brain; tools are the hands and feet. The model reasons and decides; tools execute. Never confuse the two.
- Streaming-first. Every interaction streams tokens and events in real time. Batch request-response is a non-starter for developer experience.
- Security is an architectural concern, not a feature. Permission checks, path validation, and user confirmation are baked into the agent loop itself — not bolted on after the fact.
- The context window is the scarcest resource. Every design decision — from tool result formatting to conversation history management — must be context-budget-aware.
- Tool descriptions are prompts. The text in a tool’s description field is effectively part of the system prompt. Write it with the same care.
- MCP standardizes the tool ecosystem. A shared protocol means tools are portable across agents, just as REST standardized web APIs.
- State machines drive conversation management. Explicit state transitions make the agent loop predictable, debuggable, and safe.
Final Thoughts
AI coding looks like magic. But crack it open, and the essence is surprisingly straightforward: one loop, a set of tools, and a permission system.
The loop calls the LLM, the LLM picks a tool, the tool executes, the result feeds back into the loop. Rinse and repeat until the task is done. Every piece of complexity — streaming, security, context management, plugin systems — is an elaboration on this core cycle.
Understanding these internals does not just satisfy curiosity. It gives you a framework for evaluating which tools to adopt, how to extend them, and where the real technical moats lie (hint: it is less about the loop and more about the tools and the prompts).
Reference projects:
- Amazon Q Developer CLI (Rust)
- Claude Code (TypeScript + Python)