Domain 2 18% of exam

Tool Design & MCP Integration — Complete Lesson

Domain progress
Domain 2 18% of exam · 5 Task Statements

Tool Design &
MCP Integration

How you define tools is how the agent thinks. This domain covers the full stack of tool design — from writing descriptions that produce reliable routing, to implementing structured errors that enable intelligent recovery, to configuring MCP servers and knowing when to reach for built-in tools. Every task statement here directly maps to scenarios the exam will test.

Task Statement 2.1
Task Statement 2.1

Design effective tool interfaces with clear descriptions and boundaries

Tool descriptions are the decision boundary that determines whether Claude picks the right tool. Minimal descriptions don't just underperform — they actively cause routing failures in production.

The Core Concept

When Claude encounters multiple tools, it has no access to their source code or runtime behavior — only the description text you provide. That description is re-evaluated on every single tool call, making it the most leveraged piece of configuration in your agent design.

The problem compounds with overlapping capabilities. When two tools could plausibly answer the same request, the model falls back to heuristics — and those heuristics are not predictable across invocations.

The Exam Principle: Tool descriptions are the primary mechanism LLMs use for tool selection. Minimal descriptions lead to unreliable selection among similar tools. This is the foundational rule for every tool design question in Domain 2.

Knowledge the Exam Tests

🎯

Descriptions as Selection Mechanism

Minimal descriptions → unreliable selection. The model has nothing else to differentiate between similar tools.

📋

Required Description Components

Input formats, example queries, edge cases, boundary explanations, and explicit when-to-use-vs-alternatives.

⚠️

Ambiguity Causes Misrouting

analyze_content vs analyze_document with near-identical descriptions → unpredictable routing failures.

💬

System Prompt Interference

Keyword-sensitive instructions in the system prompt create unintended tool associations that override well-written descriptions.

Skills the Exam Tests
  • Writing descriptions that differentiate purpose, expected inputs, outputs, and when-to-use vs alternatives
  • Renaming tools to eliminate functional overlap (e.g., analyze_contentextract_web_results)
  • Splitting generic tools into purpose-specific tools with defined input/output contracts (analyze_documentextract_data_points + summarize_content + verify_claim_against_source)
  • Reviewing system prompts for keyword-sensitive instructions that might override well-written descriptions

Anatomy of a Production Tool Description

A complete tool description has five required components. Missing any one of them degrades routing reliability.

python — Complete Tool Definition PRODUCTION GRADE
{
  "name": "search_customer_orders",
  "description": """
  Search for a customer's order history by customer ID or email.

  Use this tool when:
  - User asks about their orders, deliveries, or purchases
  - You need order IDs before calling process_refund
  - User references a specific order number

  Do NOT use this tool for:
  - Checking inventory (use check_inventory instead)
  - Looking up product descriptions (use get_product_details)

  Input formats accepted:
  - customer_id: "cust_12345" or integer 12345
  - email: full address, case-insensitive

  Returns: List of orders with order_id, status, total, items[], created_at

  Edge cases:
  - Returns empty list if customer has no orders (not an error)
  - If both customer_id and email provided, customer_id takes precedence
  """,
  "input_schema": {
    "type": "object",
    "properties": {
      "customer_id": { "type": "string", "description": "e.g. 'cust_12345'" },
      "email":       { "type": "string", "description": "Customer email address" }
    }
  }
}
💡
The "Do NOT use this tool for" section is one of the highest-leverage additions. It explicitly routes the model away from ambiguous cases without requiring inference.

Weak vs Strong Descriptions

The exam frequently presents minimal descriptions and asks what's wrong, or asks which description causes misrouting. Internalize this contrast.

✗ WEAK — Causes Routing Failures
"name": "analyze_content"
"description": "Analyzes content
and returns results"
✓ STRONG — Deterministic Routing
"name": "extract_web_results"
"description": "Parses structured data
from web search result JSON.
Use ONLY for raw search API
output. NOT for PDFs, uploads,
or documents — use
analyze_document for those."

The Split Pattern

Generic tools that accept a mode parameter are a routing antipattern. The model must guess the correct mode from within a single tool call — the same disambiguation problem that descriptions are supposed to solve, now embedded deeper inside.

✗ BEFORE — Generic with Mode
analyze_document(doc_id, mode=
"extract"|"summarize"|"verify")


30% of extraction requests
trigger summarization mode.
✓ AFTER — Purpose-Specific Tools
extract_data_points(doc_id)
summarize_content(doc_id)
verify_claim_against_source(
claim, doc_id)


Each tool = one purpose.
Balance with 2.3: Task 2.3 teaches that too many tools (18 instead of 4–5) degrades selection. The rule is: split by distinct purpose only. Each resulting tool should be describable in a single sentence without mentioning another tool.

System Prompt Interference

This is the least obvious knowledge point in 2.1 and therefore high-probability on the exam. Keyword-sensitive instructions in the system prompt can fire on partial matches and override tool selection logic.

Keyword collision — system prompt + tool names PROBLEMATIC
// System prompt instruction:
"When users ask about orders, prioritize the order tool."

// Tools available:
search_orders      — "Search customer order history"
process_refund     — "Process refunds for order items"
lookup_invoice     — "Retrieve invoice for order billing"

// User: "I need my order invoice for tax purposes"
// Expected: lookup_invoice
// Actual:   search_orders (keyword "order" fires the instruction)

The fix: rewrite the system prompt to use specific, unambiguous criteria. "When users ask about order status or tracking, use search_orders." Specificity in the system prompt is as important as specificity in descriptions.

Exam Traps for Task 2.1

The TrapWhy It FailsCorrect Pattern
Add prompt instructions to clarify which tool to use Prompts are evaluated alongside descriptions; keyword instructions often create new collisions Rewrite descriptions to be unambiguous. Rename if needed.
Consolidate overlapping tools into one generic tool One tool doing multiple things via hidden modes moves disambiguation inside the tool — same root problem Split into purpose-specific tools, each with a single clear contract
Keep description short — "less noise for the model" No — detailed descriptions reduce ambiguity. The model needs signal, not silence Include input formats, outputs, edge cases, and negative examples
Keep the vague name, improve only the description Tool names carry semantic signal too. A vague name undermines even a perfect description Rename to match the scope of the description (analyze_contentextract_web_results)

🔨 Implementation Task

T1

Build a Tool Suite with Routing Disambiguation

Build a 4-tool customer support suite and achieve deterministic routing across all ambiguous user requests.

  • Create 4 tools: search_orders, process_refund, check_shipping_status, escalate_to_human
  • Write each description with: use-when, do-not-use-when, input format, output structure, edge cases
  • Write a system prompt with a keyword-sensitive instruction that causes a collision — then fix it
  • Write 3 ambiguous test prompts (e.g., "where is my stuff") and verify routing is deterministic
  • Find a tool doing two things, split it, and prove routing accuracy improves

Exam Simulation — Task 2.1

Question 1 — Task 2.1 Customer Support Agent
A customer support agent has two tools: analyze_content (description: "Analyzes content") and analyze_document (description: "Analyzes documents and content"). During testing, requests for customer order PDFs are routed to analyze_content 40% of the time. What is the most effective fix?
  • AAdd a system prompt instruction: "Always use analyze_document for PDF files"
  • BRename analyze_content to extract_web_results and rewrite its description to specify it handles web API responses only, explicitly noting it should not be used for uploaded files or PDFs
  • CMerge both tools into a single analyze tool with a mode parameter
  • DIncrease the temperature of the model to reduce deterministic routing bias
Correct: B
B is correct. Both descriptions overlap — the model can't distinguish them. The fix is to rename and rewrite to draw a clear boundary at the description level. A is wrong: System prompt instructions don't reliably override description logic and often create new keyword collisions. C is wrong: A mode parameter moves the disambiguation problem inside the tool — the model still has to guess. D is wrong: Temperature controls output variability, not tool selection logic.
Question 2 — Task 2.1 Multi-Agent Research System
A research agent has a single analyze_document tool with a mode parameter accepting "extract_data", "summarize", and "verify_claim". The team finds 30% of extraction requests trigger summarization. Without changing underlying logic, what restructuring best resolves this?
  • AAdd more detailed parameter descriptions to the mode field explaining when each value should be used
  • BAdd a system prompt that maps user intent keywords to the correct mode values
  • CSplit into three separate tools — extract_data_points, summarize_content, and verify_claim_against_source — each with a single-purpose description
  • DUse tool_choice: {"type": "tool", "name": "analyze_document"} to force the tool call and let the model infer the correct mode
Correct: C
C is correct. Splitting eliminates modes entirely — each tool has one purpose and one description that can't be confused. A is wrong: Better mode descriptions help marginally but don't solve the structural problem. B is wrong: Keyword mapping is brittle and adds new collision risks. D is wrong: Forcing the tool call solves which tool to call, not which mode — the ambiguity persists.
Question 3 — Task 2.1 Customer Support Agent
A support agent system prompt says: "When users ask about orders, prioritize the order tool." Three tools exist: search_orders, process_refund (mentions "order refunds"), and lookup_invoice (mentions "order invoices"). User asks "I need my order invoice for tax purposes" — agent routes to search_orders. What is the primary cause?
  • AThe model is ignoring the system prompt instructions
  • Bsearch_orders has a higher priority weight than lookup_invoice
  • CThe system prompt's keyword "order" creates an unintended association with search_orders, overriding the more specific tool match
  • DThe lookup_invoice tool description needs to be placed before search_orders in the tool list
Correct: C
C is correct. The system prompt instruction fires on the keyword "order" in the user message and routes to search_orders — the model is following the instruction correctly, but the instruction is too broad. Fix: rewrite the system prompt to use specific criteria. A is wrong: The model is obeying the prompt — that's the problem. B is wrong: Tools have no explicit priority weights. D is wrong: List ordering has marginal effect; the keyword collision is dominant.
Task Statement 2.2
Task Statement 2.2

Implement structured error responses for MCP tools

Generic error responses ("Operation failed") are invisible walls — the agent can't reason about them. Structured errors are the difference between a dead end and an intelligent recovery decision.

The Core Concept

When an MCP tool fails, the agent must decide: retry with same inputs? retry with modified inputs? try an alternative? escalate to the coordinator? inform the user? Every one of those decisions requires knowing what kind of failure happened and whether retrying makes sense.

A generic status string like "error": "Operation failed" forces the agent to guess — or worse, to blindly retry a non-retryable failure in an infinite loop. Structured error responses hand the coordinator the data it needs to route appropriately.

The Exam Principle: Uniform error responses prevent the agent from making appropriate recovery decisions. Structured errors must include category, retryability, and a human-readable message. This applies to all MCP tool failures.

Error Categories

The exam tests knowledge of these four distinct error types and their recovery implications. Confusing transient with validation errors — or failing to mark business errors as non-retryable — are the primary failure modes.

⏱️

Transient

Timeouts, service unavailability, network interruptions. Retryable. The subagent should attempt local recovery before escalating.

🔍

Validation

Invalid input, malformed parameters, missing required fields. Not retryable without modifying input. Surface to coordinator for correction.

📋

Business

Policy violations — refund exceeds threshold, restricted action. Not retryable. Requires a customer-friendly explanation and potentially human escalation.

🔒

Permission

Insufficient access rights, auth token expired. Conditionally retryable after re-authentication. Always propagate to coordinator with context.

The Required Error Schema

The exam tests both the field names and what each field enables. Know all four fields and why each one exists.

json — MCP structured error response REQUIRED FORMAT
{
  "isError": true,                          // MCP isError flag — signals failure to agent
  "errorCategory": "transient",             // transient | validation | permission | business
  "isRetryable": true,                     // drives coordinator retry decision
  "message": "Order service timed out. Retry is safe.",
  "attemptedQuery": "order_id: ORD-8821",  // what was tried (for coordinator context)
  "partialResults": null                    // any partial data recovered before failure
}
✗ Business Error — Wrong
{
  "isError": true,
  "message": "Operation failed"
}

Agent cannot determine if retrying
makes sense. Will loop or stall.
✓ Business Error — Correct
{
  "isError": true,
  "errorCategory": "business",
  "isRetryable": false,
  "message": "Refund exceeds $500
   policy limit. Requires manager
   approval."
}
🚨
Critical distinction: An empty result set from a successful query (e.g., "customer has no orders") is NOT an error — it is a valid response. Returning isError: true for an empty result is a common mistake that causes the agent to retry a perfectly valid response.

Recovery Strategy

The exam tests the layered recovery pattern: subagents handle what they can locally, and only propagate what they can't resolve — along with everything the coordinator needs to make a good decision.

  • For transient errors: subagent implements local retry with exponential backoff. Only propagates to coordinator after local retries exhausted — includes partial results and what was attempted
  • For validation errors: subagent returns immediately with structured error and the specific invalid parameter — coordinator must correct input before re-delegating
  • For business errors: subagent returns with isRetryable: false and a customer-friendly explanation — coordinator decides whether to escalate to human
  • For permission errors: subagent returns with context about what access was needed — coordinator handles re-authentication and re-delegation
💡
Partial results matter. If a subagent completes 3 of 5 document analyses before timing out, those 3 results are valuable. Always include partialResults in the error payload so the coordinator can use what was retrieved rather than starting over.

Exam Traps for Task 2.2

The TrapWhy It FailsCorrect Pattern
Return generic "Operation failed" status Hides failure type and retryability — agent can't make recovery decisions Return errorCategory + isRetryable + descriptive message
Return isError: true for an empty result set Empty results from a valid query are a success, not an error — triggers unnecessary retries Return success with empty array; use isError only for actual failures
Subagent terminates entire workflow on timeout Kills all work done so far; coordinator may have recovery strategies available Local retry first; propagate structured error + partial results if unresolved
Mark business rule violations as retryable Causes infinite retry loops — policy violations won't resolve themselves on retry isRetryable: false + customer-friendly message for all business errors

🔨 Implementation Task

T2

Build a Structured Error Handler for an Order Tool

Implement a process_refund MCP tool that returns correctly structured errors for all four error categories.

  • Implement transient error handling: timeout after 5s returns retryable error with attempted query and any partial results
  • Implement validation error: missing or invalid order_id returns non-retryable error with specific invalid field named
  • Implement business error: refund amount > $500 returns non-retryable error with customer-friendly policy explanation
  • Implement permission error: expired token returns structured error guiding coordinator to re-authenticate
  • Implement empty result handling: customer with no orders returns success with empty array, NOT an error

Exam Simulation — Task 2.2

Question 1 — Task 2.2 Multi-Agent Research System
The web search subagent times out while researching a complex topic. You need to design how this failure flows back to the coordinator. Which error propagation approach best enables intelligent recovery?
  • AReturn structured error context to the coordinator including the failure type, the attempted query, any partial results, and potential alternative approaches
  • BImplement automatic retry with exponential backoff within the subagent, returning a generic "search unavailable" status only after all retries are exhausted
  • CCatch the timeout within the subagent and return an empty result set marked as successful
  • DPropagate the timeout exception directly to a top-level handler that terminates the entire research workflow
Correct: A
A is correct. Structured error context gives the coordinator what it needs to choose among: retry with modified query, try an alternative approach, or proceed with partial results. B is wrong: After retries, a generic status hides all context — the coordinator knows nothing about what was tried. C is wrong: Masking failure as success prevents any recovery and may produce silently incomplete research. D is wrong: Terminating the full workflow on a single subagent failure wastes all work completed so far.
Question 2 — Task 2.2 Customer Support Agent
A process_refund tool is called for a $750 refund but the policy maximum is $500. The tool currently returns {"error": "Operation failed"}. The agent retries the call four times before stalling. What is the correct fix?
  • AAdd retry limit logic — stop after 3 attempts regardless of error type
  • BReturn {"isError": true, "errorCategory": "business", "isRetryable": false, "message": "Refund amount exceeds $500 policy maximum. Manager approval required."}
  • CReturn {"isError": true, "errorCategory": "validation", "isRetryable": true} to signal input needs correction
  • DAdd a system prompt instruction: "Do not retry refund calls more than twice"
Correct: B
B is correct. A business rule violation is non-retryable — isRetryable: false tells the agent to stop immediately and take a different action (human escalation). The customer-friendly message enables the agent to communicate the reason clearly. A is wrong: A retry cap is a band-aid; it doesn't tell the agent why it failed or what to do next. C is wrong: This is a business error, not a validation error — and marking it retryable causes unnecessary retries. D is wrong: System prompt instructions are probabilistic; structured error metadata is deterministic.
Question 3 — Task 2.2 Customer Support Agent
A search_orders tool returns {"isError": true, "message": "No orders found"} when a customer has no order history. The coordinator agent keeps retrying the call assuming it's a transient failure. What is the root cause and fix?
  • AThe coordinator's retry logic is incorrect — it should check for "No orders found" in the message string to detect empty results
  • BThe tool incorrectly returns isError: true for an empty result — an empty result set is a valid successful response and should return success with an empty array
  • CAdd isRetryable: false to the error response so the coordinator knows to stop retrying
  • DAdd a system prompt instruction telling the coordinator not to retry when no orders are found
Correct: B
B is correct. This is a structural design error — empty results are a valid outcome, not a failure. The tool should return {"orders": [], "total": 0} with a 200-equivalent success status. Marking it as an error corrupts the signal the coordinator uses to make decisions. A is wrong: Parsing error message strings to detect empty results is fragile and bypasses the structured error system. C is wrong: Adding isRetryable: false treats the symptom but not the cause — it's still being reported as an error when it isn't one. D is wrong: System prompt instruction is probabilistic and doesn't fix the corrupted data signal.
Task Statement 2.3
Task Statement 2.3

Distribute tools appropriately across agents and configure tool_choice

Giving an agent the wrong tools — too many, or the wrong kind — degrades its performance just as much as poorly written descriptions. Role-scoped tool access and correct tool_choice configuration are the two levers here.

The Core Concept

Tool selection reliability degrades as tool count increases. With 4–5 focused tools, a well-described agent routes correctly and consistently. With 18 tools, the model's decision space becomes cluttered — especially when several tools are broadly described or outside the agent's specialization.

The second problem is specialization boundary violations: a synthesis agent that also has web search tools will occasionally use them when it should be synthesizing — because the tools are available and the user query contains search-like phrasing.

The Exam Principle: Give each agent only the tools needed for its role. 18 tools degrades selection reliability. 4–5 role-specific tools with clear descriptions produces reliable selection. Exceptions are narrow: a scoped cross-role tool (like verify_fact) for a high-frequency need is acceptable.

Tool Count Impact

📊

Too Many Tools (18)

Decision complexity overwhelms description clarity. Models begin selecting based on superficial name matches rather than description logic.

🎯

Role-Scoped (4–5)

Each tool is clearly the best choice for its use case. The model routes correctly because ambiguity is structurally eliminated.

🔄

Cross-Specialization Risk

A synthesis agent with web search tools will attempt web searches for fact-checking instead of flagging uncertainty to the coordinator.

✂️

Constrained Alternatives

Replace generic tools with scoped versions: fetch_urlload_document that validates document URLs only, preventing misuse.

tool_choice Configuration

Three options with very different behaviors. The exam will test whether you can select the right one for a given scenario.

ValueBehaviorUse When
"auto" Model may call a tool OR return plain text — its choice Normal conversational agent; the model decides if tools are needed
"any" Model must call some tool — cannot return conversational text Structured output extraction where text responses are invalid; guarantee a tool is invoked
{"type":"tool",
"name":"X"}
Model must call exactly the named tool Force a prerequisite step before enrichment tools (e.g., always run extract_metadata first); subsequent steps handled in follow-up turns
💡
Forced selection + follow-up turns: When you use tool_choice: {"type":"tool","name":"extract_metadata"}, that call returns. You then send a follow-up turn to process the result and call enrichment tools. The forced selection only controls the first call.

Scoped Tool Access Pattern

The solution to the 85%/15% verification problem — where a synthesis agent needs simple fact-checks 85% of the time — is a scoped cross-role tool rather than full web search access.

✗ Over-Provisioned
Synthesis agent tools:
- web_search
- fetch_url
- search_database
- synthesize_findings
- format_report

Agent attempts web searches
instead of synthesizing.
✓ Role-Scoped
Synthesis agent tools:
- synthesize_findings
- format_report
- verify_fact (scoped lookup)

85% of verifications handled
directly. 15% escalated to
coordinator → web search agent.
  • Restrict each subagent's tool set to those relevant to its role, preventing cross-specialization misuse
  • Replace generic tools with constrained alternatives (e.g., fetch_urlload_document that validates document URLs)
  • Provide scoped cross-role tools for high-frequency needs while routing complex cases through the coordinator
  • Use tool_choice: "any" to guarantee the model calls a tool rather than returning conversational text
  • Use forced tool_choice to ensure prerequisite tools are called first; process subsequent steps in follow-up turns

Exam Traps for Task 2.3

The TrapWhy It FailsCorrect Pattern
Give synthesis agent all web search tools for "flexibility" Over-provisioned agents misuse tools outside their specialization — synthesis agent will web search instead of synthesizing Scoped verify_fact tool for the 85% common case; complex verifications route through coordinator
Use tool_choice: "auto" when structured output is required "auto" allows the model to return plain text instead of calling the extraction tool Use tool_choice: "any" to guarantee a tool call; or force the specific extraction tool
Give all agents the same complete tool set for consistency Consistency at the cost of scoping — every agent's decision space is bloated and cross-specialization errors multiply Each agent gets only the tools for its role; coordinator routes tasks requiring different tools to appropriate agents

🔨 Implementation Task

T3

Design a Role-Scoped Multi-Agent Tool Architecture

Design the tool distribution for a 3-agent research system: coordinator, web search agent, synthesis agent.

  • List the tools each of the 3 agents receives — verify no agent has more than 5 tools and none have tools outside their specialization
  • Identify the high-frequency cross-role need in the synthesis agent and design a scoped tool for it
  • Write a scenario where tool_choice: "any" is the correct choice and explain why "auto" would fail
  • Write a scenario where forced tool selection is required — implement the prerequisite chain across two turns
  • Replace one generic tool (fetch_url) with a constrained alternative and explain what misuse it prevents

Exam Simulation — Task 2.3

Question 1 — Task 2.3 Multi-Agent Research System
The synthesis agent frequently needs to verify specific claims while combining findings. Currently, every verification requires 2–3 round trips through the coordinator (+40% latency). Evaluation shows 85% of verifications are simple fact-checks; 15% require deeper web investigation. What is the most effective approach?
  • AGive the synthesis agent a scoped verify_fact tool for simple lookups, while complex verifications continue delegating through the coordinator to the web search agent
  • BHave the synthesis agent accumulate all verification needs and return them as a batch to the coordinator at the end of its pass
  • CGive the synthesis agent access to all web search tools so it can handle any verification need directly without coordinator round-trips
  • DHave the web search agent proactively cache extra context during initial research, anticipating what the synthesis agent might verify
Correct: A
A is correct. Principle of least privilege: give synthesis agent only what it needs for the 85% common case. Complex verifications remain correctly routed through the coordinator. B is wrong: Batching all verifications to the end creates blocking dependencies — synthesis steps may depend on earlier verified facts. C is wrong: Over-provisioning the synthesis agent with all web search tools violates role separation and causes misuse. D is wrong: Speculative caching cannot reliably predict what will need verification.
Question 2 — Task 2.3 Structured Data Extraction
An extraction agent is configured with tool_choice: "auto" and has one tool: extract_invoice_data. During testing, 15% of calls return a text response saying "I'll analyze this invoice" instead of calling the tool. What is the correct fix?
  • AAdd a system prompt instruction: "You must always call the extract_invoice_data tool"
  • BChange tool_choice to "any" to guarantee the model calls a tool on every invocation
  • CChange tool_choice to "none" to prevent the model from defaulting to conversational text responses
  • DAdd more tools to the agent so the model has more choices and is less likely to default to text responses
Correct: B
B is correct. Setting tool_choice: "any" guarantees the model calls one of the available tools on every invocation — it cannot return plain text. With a single tool, this is the cleanest guarantee. A is wrong: tool_choice: "auto" is the current configuration that's already failing — it allows the model to choose text when it judges it appropriate. C is wrong: "none" forces the model to return only text with no tool calls — the exact opposite of what's needed. D is wrong: Adding more tools increases choice but doesn't change the fundamental behavior of "auto" mode.
Question 3 — Task 2.3 Multi-Agent Research System
A research coordinator has 8 tools: web_search, fetch_document, extract_quotes, validate_claim, cross_reference, summarize_section, calculate_statistics, and format_output. During testing, the coordinator occasionally calls format_output mid-analysis (generating partial formatted results before research completes) and calls calculate_statistics before all data is available. What is the most architecturally sound fix?
  • AAdd to the system prompt: "Tools must be called in this order: research tools first, then analysis tools, then format_output last"
  • BRemove format_output and calculate_statistics from the coordinator and call them as a post-processing step after the coordinator finishes
  • CUse tool_choice: "auto" and add detailed documentation to each tool describing when in the workflow it should be called
  • DDistribute tools across agents by workflow stage: the coordinator gets only research and validation tools; a separate analysis agent gets calculate_statistics and cross_reference; a final presentation agent gets format_output — invoked by the coordinator only after analysis completes
Correct: D
D is correct. Distributing tools across agents by workflow stage enforces correct ordering through architecture: a coordinator without format_output physically cannot call it prematurely. The stage-gating is structural, not instructional. A is wrong: The premature calls show prompt instructions aren't reliably followed for tool ordering — the same failure pattern continues. B is partially right (removing format_output from the coordinator is correct) but doesn't solve the calculate_statistics problem and leaves the analysis stage fragmented. C makes no change to the architecture — tool_choice: "auto" is already the default, and better documentation is the same approach that already failed.
Task Statement 2.4
Task Statement 2.4

Integrate MCP servers into Claude Code and agent workflows

MCP connects your agents to external systems. This task statement covers where configuration lives, how credentials are managed securely, and why your MCP tool descriptions may need enhancement to compete with Claude's built-in tools.

The Core Concept

MCP (Model Context Protocol) is the mechanism for exposing external services — GitHub, databases, Jira, custom APIs — as tools available to Claude. The exam tests two layers: where configuration belongs (project vs user scope) and how to make MCP tools reliable in production (descriptions, resources, credential management).

Key Distinction: .mcp.json is version-controlled and shared across the team. ~/.claude.json is personal and never shared. Understanding which configuration to use for which purpose is the #1 tested knowledge point in 2.4.

Server Scoping Rules

📁

Project-Level: .mcp.json

Shared team tooling. Version controlled. Available to all developers who clone the repo. Use for: GitHub, Jira, databases, any team-standard external tools.

👤

User-Level: ~/.claude.json

Personal or experimental servers. Never shared via version control. Use for: personal integrations, tools under development, anything not yet ready for team-wide use.

🔗

Discovery at Connection Time

Tools from all configured MCP servers are discovered simultaneously at connection time and all become available to the agent at once — no per-request server selection.

🌐

Community vs Custom

Use existing community MCP servers for standard integrations (Jira, GitHub). Reserve custom MCP server development for team-specific workflows without community alternatives.

Configuration & Credential Management

Never hardcode credentials in .mcp.json. Use environment variable expansion. The ${VAR} syntax is expanded at runtime from the shell environment — the config file itself contains no secrets and is safe to commit.

.mcp.json — project-scoped configuration CORRECT PATTERN
{
  "mcpServers": {
    "github": {
      "type": "url",
      "url": "https://github.mcp.example.com/sse",
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"    // ✓ expanded from shell env
      }
    },
    "jira": {
      "type": "url",
      "url": "https://jira.mcp.example.com/sse",
      "env": {
        "JIRA_API_KEY": "${JIRA_API_KEY}"   // ✓ never commit actual keys
      }
    }
  }
}
.mcp.json — what NOT to do NEVER DO THIS
{
  "mcpServers": {
    "github": {
      "env": {
        "GITHUB_TOKEN": "ghp_abc123xyz789actual_token_value"  // ✗ secret in git
      }
    }
  }
}

MCP Resources — The Often-Missed Knowledge Point

MCP resources are a mechanism for exposing content catalogs to agents — not as callable tools, but as structured data the agent can read at startup. This reduces exploratory tool calls because the agent already knows what data is available before deciding what to look up.

📚

What Resources Expose

Issue summaries, documentation hierarchies, database schemas, file catalogs — anything that helps the agent understand the landscape before making tool calls.

Why It Matters

Without resources, agents spend multiple tool calls exploring what exists. With resources, the agent reads the catalog once and makes targeted calls directly.

💡
Description enhancement for MCP tools: If your MCP tools are being ignored in favor of Claude's built-in tools (like Grep), it's because the MCP descriptions don't explain the tool's capabilities clearly enough. Enhance descriptions to explain what the MCP tool does that built-in tools cannot — making the advantage explicit prevents the agent from defaulting to familiar built-ins.

Exam Traps for Task 2.4

The TrapWhy It FailsCorrect Pattern
Put team-shared MCP servers in ~/.claude.json User-level config is personal and not shared via version control — teammates won't have the server Shared team tooling goes in project-level .mcp.json, committed to the repo
Hardcode API tokens in .mcp.json Secrets in version control — major security violation Use ${ENV_VAR} expansion. The variable is set in shell environment, not in the file.
Build a custom MCP server for Jira integration Jira has a community MCP server — building a custom one wastes effort and creates maintenance burden Use existing community MCP servers for standard integrations; custom only for team-specific workflows
Leave MCP tool descriptions minimal (same mistake as 2.1) Claude will prefer its familiar built-in tools (Grep, Bash) over MCP tools with weak descriptions Enhance MCP descriptions to explicitly explain what the MCP tool provides that built-ins cannot

🔨 Implementation Task

T4

Configure a Multi-Server MCP Setup for a Development Team

Configure a project with GitHub and a custom documentation MCP server — with correct scoping and secure credential management.

  • Write a .mcp.json with GitHub (community server) and a team-custom documentation server — using env var expansion for both tokens
  • Write a ~/.claude.json entry for a personal experimental server you're testing — explain why it goes here and not in .mcp.json
  • Write enhanced descriptions for both MCP tools that explicitly explain what they provide beyond Claude's built-in Grep and Bash tools
  • Design one MCP resource that exposes a documentation hierarchy — write what the resource catalog would contain and explain how it reduces exploratory tool calls
  • Identify which of your tool descriptions from step 3 could collide with Claude's built-in tools and fix the collision

Exam Simulation — Task 2.4

Question 1 — Task 2.4 Developer Productivity with Claude
A lead developer configured a shared GitHub MCP server for the team. A new developer clones the project and reports the GitHub MCP tools are completely absent — none of the GitHub tools appear in their Claude Code session. The lead developer used their own workstation's configuration to set this up. What is the most likely cause of this problem?
  • AThe server was configured in ~/.claude.json on the lead developer's machine only — user-level config is not shared via version control
  • BThe server should be in .mcp.json at the project root, committed to the repository — this makes it automatically available to all developers on clone
  • CThe server should be defined inside CLAUDE.md so it's loaded with other project configuration
  • DEach developer must manually install the MCP server — there is no shared configuration mechanism in Claude Code
Correct: A
A is correct. The lead developer configured the server in ~/.claude.json (user-level config on their local machine), which is not committed to version control and doesn't exist on any other developer's machine. User-level configuration is personal and non-shareable. B is wrong: .mcp.json at the project root IS the correct solution — but the question asks for the cause of the problem, not the fix. C is wrong: CLAUDE.md does not configure MCP servers — it provides instructions and conventions for Claude Code, not server definitions. D is wrong: Shared MCP configuration is a real feature of Claude Code via project-level .mcp.json committed to the repository.
Question 2 — Task 2.4 Developer Productivity with Claude
Your agent consistently uses Claude's built-in Grep tool to search the codebase instead of a configured MCP search_codebase server that provides richer results including test coverage data, dependency graphs, and usage frequency. The MCP tool description currently reads: "Searches the codebase." What is the best fix?
  • AAdd a system prompt instruction: "Always prefer the MCP search_codebase tool over built-in Grep"
  • BRemove the built-in Grep tool from the agent's available tools
  • CEnhance the MCP tool description to explicitly explain the richer capabilities it provides (coverage data, dependency graphs, usage frequency) that Grep cannot return — and specify the input formats it accepts
  • DMove the MCP server from .mcp.json to ~/.claude.json to give it higher priority
Correct: C
C is correct. This is the same root principle as 2.1 — descriptions are the selection mechanism. "Searches the codebase" is indistinguishable from what Grep does. Enhancing the description to make MCP's unique capabilities explicit gives the model a clear reason to choose it. A is wrong: System prompt instructions are probabilistic and create keyword risks. B is wrong: Removing Grep may break other workflows that legitimately need it. D is wrong: Configuration scope does not affect tool selection priority.
Question 3 — Task 2.4 Developer Productivity with Claude
Your team's Claude Code workflow uses 3 MCP servers: GitHub (PR management), Linear (ticket tracking), and Datadog (deployment monitoring). All three are configured — GitHub and Linear in the project's .mcp.json, Datadog in the user's ~/.claude.json. After onboarding a new developer, they report GitHub and Linear work correctly but Datadog tools are not available. What is the most likely cause?
  • AUser-level ~/.claude.json configuration is overridden by project-level .mcp.json when both exist — move Datadog to .mcp.json
  • BUser-level MCP servers are scoped to the home directory — Datadog tools are unavailable when working in project subdirectories
  • CThe Datadog server is configured in ~/.claude.json on the original developer's machine — this file is not shared via version control and doesn't exist on the new developer's machine
  • DThe Datadog MCP server requires authentication credentials that expired — re-authenticate to restore access
Correct: C
C is correct. ~/.claude.json is a user-level configuration file stored on the individual developer's machine and not committed to version control. When a new developer clones the repo, they get the project's .mcp.json (GitHub, Linear) but not the original developer's personal ~/.claude.json (Datadog). A is wrong: User-level and project-level configs are loaded additively — one doesn't override the other. Both GitHub/Linear and Datadog should be available if both config files exist. B is wrong: User-level MCP servers are available in all directories, not just the home directory. D is plausible but specific — authentication expiry would affect the original developer too. The pattern of GitHub/Linear working but Datadog absent points to the configuration not existing on the new machine, not to credential expiry.
Task Statement 2.5
Task Statement 2.5

Select and apply built-in tools (Read, Write, Edit, Bash, Grep, Glob) effectively

Six tools. Each has a precise use case. Knowing when to use Grep vs Glob, and when Edit fails and you must fall back to Read + Write, is what the exam tests here.

The Core Concept

Claude Code's built-in tools are the primary interface for file and codebase operations. The exam tests precise selection — not just "which tool reads files" but "given this specific task, which combination of tools is correct, and what do you do when your first choice fails?"

The two most commonly confused pairs: Grep vs Glob (content search vs path matching) and Edit vs Read+Write (targeted modification vs full file replacement fallback).

Built-in Tool Reference

Grep content search

Searches inside files for text patterns. Use for: finding all callers of a function, locating error messages, finding import statements, searching for variable names across the codebase.

Input: pattern + optional path glob
Glob path matching

Matches file paths by name or extension patterns. Use for: finding all test files, locating all config files, getting a list of all TypeScript files in a directory.

Example: **/*.test.tsx, src/**/*.config.js
Read full file load

Loads the complete content of a file into context. Use for: understanding a complete module, loading a config file, or as the first step in a Read → Write fallback when Edit fails.

Loads entire file — be selective about large files
Write full file replace

Writes complete file content. Use for: creating new files, or as the second step in a Read → Write fallback when Edit cannot find a unique anchor.

Replaces entire file — use Edit for targeted changes
Edit targeted modify

Modifies a file by matching unique anchor text and replacing it. Use for: targeted bug fixes, adding a conditional, changing a specific line. Fails if the anchor text appears more than once.

Fallback: Read → modify → Write when anchor not unique
Bash shell execution

Executes shell commands. Use for: running tests, executing scripts, git operations, package installs, anything requiring system-level execution.

Use thoughtfully — executes with shell permissions

Critical Usage Patterns

Grep vs Glob Decision Rule: Ask "am I searching for text inside files or am I searching for files by name?" If text content → Grep. If file paths/names → Glob. These are the most commonly confused tools on the exam.

The Edit → Read + Write Fallback:

✗ Edit Fails (Non-Unique Anchor)
File has 5 identical lines:
return null;

Edit cannot determine which
one to modify. Throws error:
"Anchor text not unique"
✓ Read → Modify → Write Fallback
1. Read full file contents
2. Identify correct occurrence
   by context (line number/surroundings)
3. Write complete modified file

Reliable. Always works.

Incremental Codebase Understanding:

  • Start with Grep to find entry points — search for main function names, entry module exports, top-level imports
  • Use Read selectively to follow imports and trace flows from those entry points — not reading all files upfront
  • For function usage tracing: first use Grep to identify all exported names, then search for each name across the codebase
  • Never use Read on every file in a large codebase — build understanding incrementally from the most relevant starting points

Exam Traps for Task 2.5

The TrapWhy It FailsCorrect Pattern
Use Grep to find all *.test.tsx files Grep searches file contents, not file names — wrong tool for path matching Use Glob with pattern **/*.test.tsx
Use Glob to find all files that import React Glob matches file paths, not file contents — can't search inside files Use Grep with pattern import.*React
Retry Edit after "anchor not unique" error The anchor won't become unique on retry — structural problem Fall back to Read → modify → Write immediately
Read all files in a large codebase upfront for context Fills context window with irrelevant content; context degradation follows Grep to find entry points → Read selectively to follow specific imports

🔨 Implementation Task

T5

Navigate and Modify a Real Codebase Using Built-in Tools Only

Using only Claude Code's built-in tools, complete these codebase operations on a sample project.

  • Find all test files in the project — use the correct tool (Glob) and write the glob pattern
  • Find all files that import a specific utility function — use the correct tool (Grep) and write the search pattern
  • Make a targeted edit to a unique line — use Edit. Then intentionally trigger the non-unique anchor failure and implement the Read + Write fallback
  • Trace the full usage of an exported function: find all exported names → grep for each across the codebase → document the dependency graph
  • Build understanding of an unfamiliar module incrementally: start with one Grep, identify the 3 most relevant files, Read only those 3

Exam Simulation — Task 2.5

Question 1 — Task 2.5 Developer Productivity with Claude
You need to apply a consistent testing convention to all test files in a project. The test files are spread across every directory, always named with the .test.tsx suffix. Which built-in tool and approach correctly finds all of these files?
  • AGrep with pattern .test.tsx across the entire codebase
  • BGlob with pattern **/*.test.tsx to match all test files regardless of directory location
  • CRead the root directory listing and recursively Read each subdirectory to find test files manually
  • DBash with find . -name "*.test.tsx" — equivalent to Glob but more explicit
Correct: B
B is correct. Glob is the correct tool for file path pattern matching — finding files by name or extension across directory hierarchies. **/*.test.tsx matches all test files regardless of location. A is wrong: Grep searches file contents for patterns, not file names — using it to find files named .test.tsx is the wrong tool. C is wrong: Manually recursing directories is inefficient and fills context unnecessarily. D may work but Bash is a lower-level fallback — when a dedicated tool (Glob) exists for the task, use it. The exam expects you to know the purpose-built tool for each operation.
Question 2 — Task 2.5 Developer Productivity with Claude
You use the Edit tool to fix a bug in a utility file. The tool returns an error: "Anchor text is not unique — found 4 matches." What is the correct next step?
  • ARetry the Edit call with a longer anchor text that includes more surrounding context
  • BUse Read to load the full file, identify the correct occurrence by surrounding context, then use Write to save the modified file
  • CUse Grep to find the line number of the correct occurrence, then pass the line number to Edit as additional context
  • DUse Bash with sed to perform the replacement with a line number reference
Correct: B
B is correct. When Edit fails due to non-unique anchor text, the documented fallback is Read to load full contents, then Write to save the complete modified file. This is the explicit skill from the exam guide. A is wrong: If the anchor text appears 4 times in the file, extending it may help, but the reliable fallback is Read + Write — don't keep retrying Edit. C is wrong: Edit does not accept line number parameters. D may work but Bash/sed is not the documented fallback for this specific failure mode — the exam expects the Read + Write pattern.
Question 3 — Task 2.5 Developer Productivity with Claude
A developer needs to update 47 TypeScript files to use a new API client interface. The changes are mechanical but not identical — each file uses the old interface slightly differently. They run Claude Code with the full task and return 30 minutes later to find Claude made 12 of the 47 changes correctly before context was exhausted. What is the most effective approach for completing the remaining 35 files?
  • ARestart Claude Code with a larger context window model and resubmit the same task from the beginning
  • BUse the Edit tool in a loop: for each remaining file, read the file, compute the change, write the result — 35 sequential operations
  • CUse Glob to find the 35 remaining files, then process them in batches of 8–10 per Claude Code session with a precise task description specifying the exact interface change — each batch session is self-contained and doesn't require prior session history
  • DWrite a Bash script using sed to apply the transformation mechanically without Claude, since the changes are described as mechanical
Correct: C
C is correct. Bounded batches ensure each session stays within context limits, and a precise self-contained task description makes each batch independently executable without needing prior session history. Glob first gives an exact file list so no files are skipped or doubled. A is wrong: A larger context window hits the same limit — with 47 files of TypeScript, even a 200k-token model may exhaust context, and the session still has to start from scratch. B is technically viable but pure Edit/Write loops can't handle the "not identical" variation — the stem explicitly says each file uses the interface differently, which requires Claude's judgment, not mechanical string replacement. D is wrong for the same reason: sed is a pattern-replacement tool. It cannot handle semantic variation in how different files use the old interface.