Agentic Architecture
& Orchestration
The heaviest domain at 27%. It covers everything from the mechanics of a single agent loop to coordinating fleets of specialized subagents. The exam tests whether you know the exact stopping condition for a loop, whether you understand that subagents have zero inherited context, and whether you reach for programmatic enforcement or prompt instructions when correctness is non-negotiable.
Every scenario in the exam touches Domain 1 — whether it's the Customer Support agent enforcing identity verification, the Research system coordinating subagents, or the CI/CD pipeline decomposing large reviews into focused passes.
Design and implement agentic loops for autonomous task execution
The Core Concept
An agentic loop sends a message to Claude, receives a response, checks the stop_reason field, executes any requested tools, appends the results to the conversation history, and repeats. The loop continues until Claude signals it is finished by returning stop_reason == "end_turn".
The critical insight: Claude decides when it is done, not the developer's iteration counter or text-parsing logic. The stop_reason field is the only reliable signal. Everything else is a workaround that will fail in production.
stop_reason == "end_turn". The loop continues when stop_reason == "tool_use". Never parse text content to determine termination. Never rely solely on iteration caps. These are the two most tested facts in 1.1.Loop Lifecycle
1. Send Request
Send the current conversation history (including all previous tool results) to Claude. The model reasons over the full history to decide its next action.
2. Inspect stop_reason
"tool_use" → Claude wants to call a tool. "end_turn" → Claude is finished. These are the only two values that matter for loop control.
3. Execute Tools
For each tool call in the response, execute the tool and collect the result. Tool calls are in response.content blocks with type == "tool_use".
4. Append Results
Append the assistant's response AND the tool results to conversation history. Both are required — omitting the assistant turn corrupts the conversation structure.
Correct Implementation
def run_agent(client, tools, initial_message): messages = [{"role": "user", "content": initial_message}] while True: response = client.messages.create( model="claude-sonnet-4-20250514", tools=tools, messages=messages ) # ✓ ONLY correct termination signal if response.stop_reason == "end_turn": break # ✓ Continue loop on tool_use if response.stop_reason == "tool_use": # Append assistant response to history messages.append({ "role": "assistant", "content": response.content }) # Execute each tool call and collect results tool_results = [] for block in response.content: if block.type == "tool_use": result = execute_tool(block.name, block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result }) # ✓ Append tool results for next iteration messages.append({ "role": "user", "content": tool_results }) return response
Anti-Patterns the Exam Tests
stop_reason check inside is correct. A loop that stops at 10 iterations without ever checking stop_reason is not.Exam Traps for Task 1.1
| The Trap | Why It Fails | Correct Pattern |
|---|---|---|
| Parse "I'm done" or "Task complete" in response text to terminate | Text content is non-deterministic — Claude may phrase completion differently, or include those words mid-task | Check stop_reason == "end_turn" exclusively |
| Check if response has no tool_use blocks as a completion signal | A response with only text and stop_reason == "tool_use" doesn't exist — but this logic misses edge cases |
Rely on stop_reason, not content structure |
| Omit the assistant turn when appending tool results | The API requires alternating user/assistant turns. Jumping straight to tool results breaks the conversation structure | Append assistant response first, then append tool results as a user turn |
| Use pre-configured decision trees instead of model-driven tool calling | Removes the model's ability to reason about context — inflexible and brittle for novel inputs | Let Claude decide which tool to call based on context; use programmatic gates only for ordering constraints |
🔨 Implementation Task
Build and Stress-Test a Production Agentic Loop
Implement a loop and deliberately trigger each failure mode to confirm your termination logic is correct.
- Implement the agentic loop using
stop_reasonas the sole termination signal - Add a safety cap of 25 iterations with an explicit warning log — verify it never fires on normal tasks
- Test: have Claude call 3 tools in sequence — confirm all results are appended and reasoning is continuous
- Break it intentionally: omit the assistant turn append — observe the API error and understand why
- Add text-based termination as a second branch — prove it fires incorrectly on a response that contains "done" mid-reasoning
Exam Simulation — Task 1.1
tool_use blocks) as reasoning turns — turns where it thinks through the next step before calling a tool. These responses have stop_reason == "end_turn", which is the same signal as genuine task completion. Using the absence of tool_use blocks as a termination condition incorrectly fires on these reasoning turns, ending the loop prematurely while the task is still in progress. A is wrong: Tool description quality doesn't cause early termination. C is wrong: An iteration cap is a band-aid and not the right mental model for loop control. D is wrong: Missing tool results cause repetition, not early termination.if "research complete" in response.content[0].text: break. What is wrong with this approach, and what is the correct implementation?stop_reason field is a structured signal set programmatically by the API — the only reliable termination signal. A and B fix edge cases without addressing the fundamental design flaw. D is an interesting idea but still non-deterministic — Claude may paraphrase or the phrase may appear mid-reasoning.Orchestrate multi-agent systems with coordinator-subagent patterns
The Core Concept
In a hub-and-spoke architecture, a coordinator agent receives the original request, decomposes it into subtasks, delegates each to a specialized subagent, collects results, and synthesizes the final response. No subagent communicates directly with another — all routing passes through the coordinator.
Hub-and-Spoke Architecture
Coordinator Role
Analyzes query complexity, decomposes into subtasks, selects which subagents to invoke, aggregates results, evaluates coverage, and re-delegates if gaps exist.
Subagent Role
Specialized for one task type. Receives a complete, self-contained prompt from the coordinator. Executes and returns structured results. No awareness of other subagents.
All Routing Through Coordinator
Prevents spaghetti communication patterns. Enables consistent error handling, logging, and retry logic in one place rather than scattered across agents.
Iterative Refinement
Coordinator evaluates synthesis output for coverage gaps, re-delegates with targeted queries, and re-invokes synthesis — repeating until coverage is sufficient.
Coordinator Design Principles
The most common coordinator failure is overly narrow task decomposition — breaking "impact of AI on creative industries" into only visual arts subtasks, because that's what the coordinator knows best. The result: every subagent completes successfully, but the final output has systematic blind spots.
- Design coordinator prompts specifying research goals and quality criteria — not step-by-step procedural instructions, to preserve subagent adaptability
- Partition scope across subagents to minimize duplication (distinct subtopics or source types per agent)
- Implement iterative refinement: evaluate synthesis output → identify gaps → re-delegate with targeted queries → re-synthesize
- Route all inter-subagent information through the coordinator — never allow direct subagent-to-subagent communication
Exam Traps for Task 1.2
| The Trap | Why It Fails | Correct Pattern |
|---|---|---|
| Subagents automatically inherit coordinator context | They do not. Each subagent invocation is a fresh context. Assuming inheritance leads to silent failures where subagents lack required information | Explicitly pass all needed context in the subagent's prompt |
| Blame synthesis agent when final output has coverage gaps | If each subagent completed successfully, the gap is in what they were assigned — coordinator decomposition is the root cause | Inspect coordinator logs first. Narrow decomposition is the most common cause of systematic gaps |
| Allow subagents to call each other directly for efficiency | Bypasses coordinator's observability and error handling — creates spaghetti flows that are impossible to debug | All communication routes through coordinator; coordinator handles retries and routing decisions |
🔨 Implementation Task
Build a 3-Agent Research Coordinator
Implement a coordinator + web search agent + synthesis agent. Deliberately create and then fix a decomposition failure.
- Implement the coordinator with hub-and-spoke routing — all subagent communication through coordinator only
- Run on "impact of remote work on urban planning" — log the decomposition. Identify if any major category is missing
- Implement the iterative refinement loop: coordinator evaluates synthesis output and re-delegates if coverage is below threshold
- Test context isolation: verify the synthesis agent has no access to the web search agent's raw conversation — only the coordinator-passed results
- Deliberately break decomposition by giving coordinator a narrow system prompt — observe the coverage gap and fix it
Exam Simulation — Task 1.2
WebSearchAgent, DocumentAnalysisAgent, and DataValidationAgent. For each request, WebSearch and DocumentAnalysis can run simultaneously (independent inputs), while DataValidation must run after both complete. Currently all three run sequentially: total latency = 45s + 60s + 30s = 135s. What architecture achieves the minimum possible latency?Configure subagent invocation, context passing, and spawning
The Core Concept
Subagents are not automatically created — they are spawned using the Task tool. The coordinator must have "Task" in its allowedTools list, and each subagent is defined via AgentDefinition with its own system prompt, description, and tool restrictions.
The Task Tool
Requirement: allowedTools
The coordinator's allowedTools must include "Task". Without this, the coordinator cannot spawn subagents regardless of prompt instructions.
AgentDefinition
Defines each subagent type with: description, system prompt, and tool restrictions. The system prompt scopes the subagent's behavior. Tool restrictions enforce role separation.
Complete Context in Prompt
Every piece of information the subagent needs must be in the Task call's prompt. Source URLs, document names, prior agent outputs — all must be explicitly included.
Fork-Based Session Management
Fork sessions create independent branches from a shared analysis baseline — enabling divergent explorations without contaminating the main session context.
Context Passing Pattern
When passing context between agents, use structured data formats that separate content from metadata. Raw text blobs lose attribution — source URLs, document names, and page numbers disappear during synthesis.
# ✓ Pass structured context explicitly — not raw text synthesis_prompt = f""" You are a synthesis agent. Combine the following research findings into a comprehensive report on: {research_topic} Web Search Results: {json.dumps(web_results, indent=2)} Document Analysis: {json.dumps(doc_analysis, indent=2)} Each result includes: source_url, excerpt, relevance_score, date. Preserve source attribution in your synthesis. Quality criteria: Cover all major categories. Flag any gaps. """ # ✗ DO NOT pass context like this: synthesis_prompt = f"Synthesize this: {str(all_results)}" # ↑ Loses structure, attribution, and metadata
Parallel Spawning
To run subagents in parallel, emit multiple Task tool calls in a single coordinator response. Spawning them across separate turns forces sequential execution and negates the latency benefit.
- Include complete findings from prior agents directly in the subagent's prompt — never assume it can access them from history
- Use structured data formats (JSON with source URLs, document names, page numbers) to preserve attribution when passing context
- Spawn parallel subagents by emitting multiple Task calls in a single coordinator response — not across separate turns
- Design coordinator prompts with goals and quality criteria — not step-by-step instructions — to preserve subagent adaptability
Exam Traps for Task 1.3
| The Trap | Why It Fails | Correct Pattern |
|---|---|---|
| Spawn subagents across multiple turns for parallelism | Each turn is sequential — the coordinator waits for each Task result before proceeding to the next turn | Emit all parallel Task calls in a single coordinator response |
| Pass raw string concatenation of results between agents | Loses structure and attribution — source URLs, dates, and metadata disappear; synthesis agent cannot distinguish findings | Pass structured JSON with explicit fields for content, source, date, relevance |
| Give coordinator step-by-step procedural instructions | Overly procedural prompts make subagents rigid — they can't adapt when intermediate results reveal new requirements | Specify research goals and quality criteria; let subagents determine their approach |
🔨 Implementation Task
Implement Parallel Subagent Spawning with Structured Context
Build parallel spawning and structured context passing, then measure the latency difference vs sequential.
- Configure coordinator with
allowedTools: ["Task", ...]and define 3 subagent types via AgentDefinition - Implement sequential spawning — measure total latency for 3 subagents
- Implement parallel spawning (multiple Task calls in one response) — measure total latency and confirm it's ~1x not 3x
- Pass context using a structured JSON schema including: content, source_url, date, relevance_score
- Verify isolation: confirm synthesis agent has zero access to web search agent's raw conversation history
Exam Simulation — Task 1.3
Implement multi-step workflows with enforcement and handoff patterns
The Core Concept
Prompt-based guidance tells Claude what it should do. Programmatic enforcement tells the system what it can do. For critical business logic — identity verification before financial operations, compliance checks before data access — only programmatic gates provide the deterministic guarantees that production requires.
Programmatic Prerequisite Gates
A prerequisite gate intercepts a tool call and checks whether required prior steps have been completed. If not, it blocks the call and returns a structured error explaining what must happen first.
def execute_tool(tool_name, tool_input, session_state): # ✓ Programmatic gate — runs before every tool execution if tool_name in ["process_refund", "lookup_order"]: if not session_state.get("verified_customer_id"): return { "error": "Prerequisite not met", "required": "get_customer must be called first", "reason": "Identity verification required before order operations" } if tool_name == "get_customer": result = get_customer_impl(tool_input) # ✓ Gate is cleared once prerequisite completes session_state["verified_customer_id"] = result["customer_id"] return result # ✗ Prompt-only approach (12% failure rate): # "Always call get_customer before any order operations"
Structured Handoff Protocols
When an agent must escalate to a human, the handoff package must be complete — the human agent receiving it has no access to the conversation history. Every decision, finding, and recommendation must be compiled into a self-contained summary.
- Decompose multi-concern requests into distinct items, investigate each in parallel using shared context, then synthesize a unified resolution before handoff
- Compile structured handoff summaries with: customer ID, issue root cause, refund amount or action taken, recommended next action
- Include what was attempted and what was not — the human agent needs to know where to pick up
- Never escalate with "I couldn't help" — always include the investigation results that led to escalation
Exam Traps for Task 1.4
| The Trap | Why It Fails | Correct Pattern |
|---|---|---|
| Enhance system prompt to make verification "mandatory" | Prompt instructions have a non-zero failure rate for compliance. "Mandatory" in a prompt is advisory, not enforced | Programmatic gate that physically blocks the tool call until prerequisite state is set |
| Add few-shot examples showing correct tool order | Few-shot examples improve probability but don't provide deterministic guarantees for financial operations | Gates for financial/identity operations; few-shot for classification and routing where probabilistic is acceptable |
| Escalate with just "I need human assistance" | Human agent has no context — they must start over. Wastes the investigation work already done | Structured handoff including customer ID, root cause, what was found, and recommended action |
🔨 Implementation Task
Build a Prerequisite Gate for a Financial Workflow
Implement the get_customer → lookup_order → process_refund gate chain and prove it enforces order deterministically.
- Implement session state tracking with
verified_customer_idflag - Build the prerequisite gate: block
lookup_orderandprocess_refunduntilget_customersets the flag - Test bypass attempt: craft a prompt where the user volunteers their order ID — confirm the gate still fires
- Implement structured handoff: when a refund exceeds $500, compile a complete handoff package and escalate
- Compare with prompt-only: remove the gate, add a system prompt instruction — run 10 tests and count how many bypass verification
Exam Simulation — Task 1.4
get_customer entirely and calls lookup_order using only the customer's stated name, occasionally leading to misidentified accounts and incorrect refunds. What change would most effectively address this reliability issue?identify_customer → lookup_order → assess_eligibility → calculate_compensation → process_refund. Post-deployment analysis shows that in 8% of cases, process_refund is called with a null customer_id because a lookup_order failure was silently absorbed 3 steps earlier. What is the most effective architectural change to prevent null propagation through the workflow?lookup_order with invalid state. A is wrong: Validation at process_refund only catches the null 3 steps later, after the workflow has done unnecessary work and potentially taken other actions. B is wrong: The 8% failure rate proves prompt instructions aren't sufficient — programmatic enforcement is required. C is wrong: Retry addresses transient failures, not the null propagation architecture problem. If lookup_order returns null after retries, the same null still propagates downstream.lookup_order at step 2, and the refund amount is determined by calculate_compensation at step 4. Currently, process_refund occasionally executes refunds exceeding the order total due to compensation calculation errors. What is the most reliable enforcement mechanism?calculate_compensation are probabilistic — the same conditions causing occasional overages will cause them again. C is wrong: Post-call audits are a recovery mechanism, not a prevention mechanism — money has already moved before the audit runs. D is wrong: Logging the order total improves auditability but adds no enforcement — the model can still pass an incorrect amount to process_refund regardless of what's in session state.Apply Agent SDK hooks for tool call interception and data normalization
The Core Concept
Hooks intercept the tool call lifecycle at two points: before a tool is called (tool call interception) and after it returns (PostToolUse). This gives you a centralized place to enforce compliance rules and normalize data formats before Claude reasons about tool results.
Hook Patterns
PostToolUse Hook
Intercepts tool results before the model sees them. Use for: normalizing heterogeneous data formats (Unix timestamps → ISO 8601, numeric status codes → human-readable strings) from different MCP tools.
Tool Call Interception Hook
Intercepts outgoing tool calls before execution. Use for: blocking policy-violating actions (refunds exceeding $500), redirecting to alternative workflows, logging compliance events.
Deterministic Guarantee
Hook logic runs in application code — not through the LLM. This means 100% enforcement. A hook that blocks a call will never fail due to model reasoning.
Data Normalization
Multiple MCP tools return different timestamp formats, status codes, and field names. PostToolUse hooks normalize everything to a consistent schema before Claude processes it.
Hook Implementations
def post_tool_use_hook(tool_name, tool_result): """Normalize heterogeneous data before model processes it.""" if "created_at" in tool_result: ts = tool_result["created_at"] # Unix timestamp → ISO 8601 if isinstance(ts, (int, float)): tool_result["created_at"] = datetime.utcfromtimestamp(ts).isoformat() # Numeric status codes → human-readable status_map = {1: "active", 2: "pending", 3: "cancelled"} if "status" in tool_result and isinstance(tool_result["status"], int): tool_result["status"] = status_map.get(tool_result["status"], "unknown") return tool_result
def pre_call_hook(tool_name, tool_input, session_state): """Block policy-violating actions before execution.""" if tool_name == "process_refund": amount = tool_input.get("amount", 0) if amount > 500: # Block and redirect to escalation workflow return { "blocked": True, "reason": "Refund exceeds $500 policy maximum", "action": "escalate_to_manager", "amount_requested": amount } return None # None = allow the call to proceed
Exam Traps for Task 1.5
| The Trap | Why It Fails | Correct Pattern |
|---|---|---|
| Use system prompt to enforce the $500 refund policy | Prompts are probabilistic — a sufficiently unusual input or edge case will bypass the instruction | Pre-call hook that reads the amount and blocks before execution, 100% of the time |
| Normalize data formats inside each tool implementation | Scatters normalization logic across tools — inconsistent, hard to audit, breaks when new tools are added | Centralized PostToolUse hook normalizes all tool outputs to a consistent schema |
| Use hooks for all compliance, even style/formatting guidelines | Overkill — hooks are for deterministic requirements; prompts handle stylistic preferences effectively | Hooks for financial, identity, and regulatory compliance. Prompts for formatting, tone, and style. |
🔨 Implementation Task
Build a Hook Layer for a Customer Support Agent
Implement both a PostToolUse normalization hook and a pre-call compliance hook.
- Build PostToolUse hook that normalizes: Unix timestamps → ISO 8601, numeric status codes → strings, currency in cents → formatted dollars
- Build pre-call hook blocking
process_refundwhen amount > $500 and redirecting toescalate_to_manager - Test: send a refund request for $750 — confirm the hook blocks it before the tool executes, not after
- Test normalization: mock a tool that returns Unix timestamp and numeric status — confirm Claude receives ISO and string versions
- Compare: remove the hook and add a system prompt for the $500 rule — run 20 tests with varying inputs and count bypasses
Exam Simulation — Task 1.5
Design task decomposition strategies for complex workflows
The Core Concept
Task decomposition is the act of breaking a complex goal into a sequence of smaller, executable steps. The right pattern depends on how much is known about the task upfront: prompt chaining for predictable workflows where the steps are known in advance, dynamic decomposition for open-ended investigations where each step reveals what to do next.
Decomposition Patterns
Prompt Chaining
Fixed sequential pipeline. Each step's output is the next step's input. Use when: the workflow has predictable stages (analyze file → summarize → compare → report). Steps known in advance.
Dynamic Adaptive Decomposition
Generates subtasks based on what's discovered. Use when: the task is open-ended and intermediate findings change what to explore next (e.g., "add comprehensive tests to a legacy codebase").
Per-File + Integration Pass
Split large multi-file reviews: analyze each file individually for local issues, then run a separate cross-file integration pass. Avoids attention dilution in single-pass reviews of 10+ files.
Map-First, Then Plan
For open-ended tasks: first map the full scope (all files, all dependencies), identify high-impact areas, then generate a prioritized plan that can adapt as dependencies are discovered.
Exam Traps for Task 1.6
| The Trap | Why It Fails | Correct Pattern |
|---|---|---|
| Use dynamic decomposition for a predictable multi-step review | Dynamic decomposition adds overhead and unpredictability when the steps are already known and fixed | Prompt chaining for predictable workflows; dynamic only for open-ended investigation tasks |
| Switch to a larger context model to review 14 files in one pass | Context window size doesn't solve attention dilution — models still process middle content less reliably | Split into per-file passes + integration pass; attention is consistent within each focused pass |
| For "add tests to a legacy codebase," start implementing immediately | Without mapping the codebase first, tests will duplicate existing coverage and miss high-impact areas | Map structure → identify high-impact areas → create prioritized plan → implement adaptively |
🔨 Implementation Task
Implement Both Decomposition Patterns and Compare
Build both patterns on the same problem set and demonstrate when each is appropriate.
- Implement a prompt chain for a 5-step code review: parse → analyze → summarize per file → cross-file compare → report
- Implement dynamic decomposition for "identify all test gaps in this codebase" — observe how the plan adapts to discoveries
- Run a single-pass review on 8 files. Document the inconsistencies in depth and any contradictions
- Re-run with per-file passes + integration pass. Compare output quality and consistency
- Classify 5 new task descriptions as "prompt chain" or "dynamic" and justify each classification
Exam Simulation — Task 1.6
Manage session state, resumption, and forking
The Core Concept
Sessions preserve conversation history and tool results across work sessions. But session resumption is not always the right choice — when the files being analyzed have changed since the last session, the cached tool results are stale and the model will reason incorrectly from them.
Session Resumption
--resume <session-name>
Continues a specific prior named conversation. The full history — including all tool calls and results — is restored. Use when: investigation is paused mid-task and no analyzed files have changed.
Summary Injection
Start a new session but open with a structured summary of prior findings. Use when: files have been modified since the last session — prior tool results no longer reflect reality.
Targeted Re-Analysis
When resuming after file changes, inform the agent specifically which files changed — don't require full re-exploration of unchanged areas. Focus re-analysis on what's different.
Stale Tool Results
A resumed session where files have changed since the last run contains tool results that contradict the current state of the code. The model will reason incorrectly from stale data.
Session Forking
fork_session creates an independent branch from the current session's state. Both branches share the same history up to the fork point, then diverge independently. Neither branch's changes affect the other.
- Use
--resume <session-name>for named session continuation when prior context is mostly valid - Use
fork_sessionto compare two approaches (e.g., testing strategies, refactoring patterns) from a shared analysis baseline - Choose summary injection over resumption when prior tool results are stale — inject a structured summary of findings as the first message
- When resuming after changes, explicitly tell the agent which specific files changed — enable targeted re-analysis rather than full re-exploration
Exam Traps for Task 1.7
| The Trap | Why It Fails | Correct Pattern |
|---|---|---|
| Resume a session after the codebase has been refactored | Tool results from the previous session reflect the old codebase — model reasons incorrectly from stale data | Start fresh with a summary of prior findings; specify which files changed for targeted re-analysis |
| Fork a session to run two strategies, then merge results back | Fork branches are independent — there's no merge operation. Results from forked branches must be collected by the original coordinator session | Fork for independent exploration; have the coordinator collect and compare results from both branches |
| Re-explore the entire codebase after a targeted file change | Wastes time and context on files that haven't changed; prior analysis of unchanged files is still valid | Inform the resumed session specifically which files changed — only re-analyze those |
🔨 Implementation Task
Implement Session Resumption and Forking with Stale Detection
Build session management that correctly handles stale results and enables divergent exploration.
- Implement a named session workflow: analyze a codebase, pause, resume with
--resumeand verify context is intact - Simulate a stale session: modify two files after a session, attempt resumption — observe where the model reasons incorrectly from old data
- Fix it: implement fresh start with structured summary injection specifying which files changed
- Implement
fork_session: from a shared analysis baseline, explore "add unit tests" vs "add integration tests" in parallel branches - Compare branch results in the original coordinator session and synthesize the better approach
Exam Simulation — Task 1.7
fork_session is exactly the tool for this scenario: both branches inherit the full analysis context, and neither branch's exploration contaminates the other. A wastes effort by re-running the entire codebase analysis twice — fork lets both branches share the work already done. C is wrong: Sequential exploration in a single session means strategy A's findings are in context during strategy B's exploration — they contaminate each other. D is wrong: --resume continues a session, it does not copy or fork it.