Session 8: Tool Integration, Function Calling & MCP
An LLM without tools is a brain in a jar—it can think, but it can't DO anything. It can wax poetic about the weather in Paris or speculate about your calendar, but it's all hallucination. The moment you give it a get_weather tool, everything changes. Suddenly it can fetch real data, create real bookings, query real databases. That's the power move in AI engineering: tools are how you turn a fancy autocomplete into something that actually matters.
Here's the thing nobody tells you upfront: tool integration is where most agent systems either shine or crumble. Get the schema wrong and the model picks the wrong tool half the time. Skip security and you've got a runaway agent deleting production data. Ignore error handling and your users see "I'm sorry, something went wrong" when the weather API hiccups. Senior AI Engineer interviews drill into this because it's the difference between a demo that wows in a Slack thread and a system that ships to thousands of users.
This session covers function calling (OpenAI and Anthropic), the Model Context Protocol (MCP), schema design, orchestration patterns, security, and error handling. By the end, you'll understand not just how tools work but why certain decisions make or break production systems. And you'll have the Maersk context to back it up.
1. What Is Tool/Function Calling?
Interview Insight: Interviewers want to confirm you understand the fundamental model: the LLM requests, your code executes. They're probing whether you've actually built this loop or just read about it.
Tool calling (also called function calling) is how LLMs request execution of external functions. The model doesn't run code—it outputs structured JSON with a function name and arguments. Your application parses that, runs the corresponding function, and sends the result back. The model then weaves that data into its response.
Think of it like a restaurant: the model is the waiter who takes your order and writes it down. The kitchen (your codebase) actually cooks. The waiter never touches the stove—they just relay requests and bring back the food. That separation is deliberate: the model suggests; you decide what gets executed. No arbitrary code, no untrusted execution. You control the tools, the schemas, and the safety.
Technical flow: User asks "What's the weather in Paris?" → Model receives the prompt + tool list → Model outputs {"name": "get_weather", "arguments": {"location": "Paris"}} → Your code calls get_weather("Paris") → Result goes back to the model → Model produces "The weather in Paris is 22°C."
Why This Matters in Production: Every tool call is a trust boundary. The model's output is untrusted until validated. You must sanitize inputs, validate outputs, and never assume the model will "do the right thing" with sensitive operations.
Aha Moment: The model never sees your source code. It only sees tool names and descriptions. Bad descriptions = wrong tool selection. This is why schema design is half the battle.
2. OpenAI Function Calling
Interview Insight: They want to know you've shipped with OpenAI's API—tools parameter, tool_choice, handling tool_calls, returning function_call_output. The details matter.
OpenAI's function calling uses a tools parameter—a list of tool definitions. Each has type: "function", name, description, and parameters (JSON Schema). The description is in the prompt; it's how the model knows when to call the tool. Vague descriptions = wrong calls.
tool_choice: "auto" (default) lets the model decide. "required" forces at least one tool call. "none" disables tools. You can also force a specific function: {"type": "function", "function": {"name": "get_weather"}}. As of 2025, allowed_tools restricts the model to a subset while keeping the full list for prompt caching.
Response structure: When the model calls a tool, you get finish_reason: "tool_calls" and an array of function_call items with call_id, name, and arguments (JSON string). Execute the function, then append a message with type: "function_call_output", matching call_id, and your result. Make a second API call with the updated conversation.
Parallel tool calls: The model can request multiple tools in one response. Execute them all (concurrently with asyncio.gather), return each with its call_id. Set parallel_tool_calls: false to limit to one per turn. Note: parallel calls aren't supported with built-in tools (web search, MCP).
Strict mode: strict: true enforces exact schema conformance. All properties required (use null for optional), additionalProperties: false. Recommended for production.
Why This Matters in Production: Mismatched call_ids or malformed tool results break the loop. You need robust parsing and error handling so one bad tool doesn't kill the entire agent flow.
Aha Moment: OpenAI's
argumentscome as a JSON string. You must parse it. Anthropic's Claude gives you parsed objects—different defaults, different gotchas.
3. Anthropic Tool Use
Interview Insight: They're checking if you understand provider differences. Claude uses
tool_useblocks andtool_resultin user message content—not a separate role.
Anthropic's API is conceptually similar but structurally different. Tools have name, description, and input_schema (not parameters). You get stop_reason: "tool_use" when the model wants to call tools. The content array has text blocks and tool_use blocks. Each tool_use has id, name, and input—parsed arguments, not a string.
To return results: add a new user message with tool_result content blocks. Each has tool_use_id (matching the id from tool_use) and content (your result string). All tool_result blocks must match all tool_use blocks—missing IDs cause API errors.
Client vs. server tools: Anthropic distinguishes client tools (you implement) from server tools (Anthropic runs—e.g., web search). Server tools use versioned types like web_search_20250305 and run server-side; you don't handle execution.
Why This Matters in Production: Claude's content array can mix text and tool_use in one message. Your loop logic must handle that—extract tool_use blocks, execute, build tool_result blocks, append, and continue until
stop_reasonis"end_turn".
Aha Moment: Use
input_examplesfor complex schemas. Anthropic recommends them; they dramatically improve parameter filling for nested or ambiguous structures.
4. Tool Schema Design
Interview Insight: This is where senior engineers stand out. "I wrote clear descriptions" sounds trivial until you've debugged a model calling
searchwhen it meantfindfor the tenth time.
The schema is the model's interface to your tools. No source code, no docstrings—just name, description, and parameter schema. If the model misunderstands, it's usually the schema's fault.
JSON Schema: Use type, properties, required, description, enum, additionalProperties, nested objects. Full expressiveness. Constrain inputs to reduce invalid calls.
Descriptions are everything: "Gets data" is useless. "Retrieves current weather for a given city. Use when the user asks about temperature, conditions, or forecasts." is actionable. Each parameter needs: what it is, what format, any constraints. "City and country, e.g. San Francisco, CA" beats "The location."
Required vs. optional: Mark only truly necessary params as required. Optional = flexibility but also more omission/mis-specification. Use enum for constrained choices.
Anti-patterns: Vague descriptions, too many parameters, overlapping tools (search vs find), redundant tools. Fewer tools with clear names beat many ambiguous ones. OpenAI recommends fewer than 20; with 50+, use dynamic loading or routing.
Why This Matters in Production: At Maersk, your booking agent needs tools like
create_booking,search_sailings,get_quote. Overlap those and the model confuses create vs. search. Distinct names and explicit "use when" / "do not use when" clauses prevent that.
Aha Moment: Dynamic tool loading: pass only the 5–10 tools relevant to the current task. A calendar agent doesn't need weather. A summarization agent doesn't need delete. Router → subset → model.
5. MCP (Model Context Protocol)
Interview Insight: MCP is hot. They want to know: What problem does it solve? How does the architecture work? How is it different from LangChain tools?
MCP is like USB-C for AI—one standard plug for everything. Before MCP, every tool integration was custom: weather API, database, email, file system—each with its own API shape, auth, error handling. MCP says: write one server per tool group, any MCP client can discover and use it. Write once, plug in everywhere.
Architecture: MCP Host (Cursor, Claude Desktop, VS Code) coordinates. MCP Client connects to a specific server, discovers tools via list_tools(), routes requests, returns results. MCP Server exposes tools, resources, and prompts. Host ↔ multiple Clients ↔ each Client ↔ one Server (1:1).
What servers expose: Tools (callable functions with schemas), Resources (read-only data—files, DB queries), Prompts (reusable templates). Tools = model-controlled. Resources = application-controlled. Prompts = user-controlled.
Transport: Stdio for local (subprocess, stdin/stdout, newline-delimited JSON-RPC). Streamable HTTP for remote (single endpoint, POST + optional GET for streaming). SSE deprecated in favor of Streamable HTTP.
Security: MCP enables powerful capabilities—arbitrary data, code execution. Spec mandates user consent, data privacy, tool safety. Hosts must obtain explicit consent before invoking tools. Treat tool descriptions from untrusted servers as untrusted.
flowchart TB
subgraph mcpHost["MCP Host e.g. Cursor"]
mcpClient1[MCP Client 1]
mcpClient2[MCP Client 2]
end
subgraph mcpServers[MCP Servers]
dbServer[Server: Database]
weatherServer[Server: Weather]
end
dbServer -->|"Tools: query, insert"| mcpClient1
weatherServer -->|"Tools: get_weather"| mcpClient2
mcpClient1 --> mcpHost
mcpClient2 --> mcpHostWhy This Matters in Production: On an enterprise platform like Maersk's Agent Platform, you might run multiple MCP servers—booking APIs, scheduling, RAG sources. Stdio for local dev; HTTP for scalable deployment. One protocol, many backends.
Aha Moment: MCP isn't LangChain. LangChain tools are framework-specific. MCP is protocol-agnostic. An MCP server in Python works with Cursor, Claude Desktop, or your custom app. No lock-in.
6. Tool Orchestration Patterns
Interview Insight: They want to see you think in systems—sequential vs. parallel, when to route, when to retry, when to fall back.
Sequential: Search → fetch document → summarize. Each step depends on the previous. Simple, debuggable. Latency adds up.
Parallel: Weather for Paris, London, Tokyo in one turn. Independent tools, fire them all. Both OpenAI and Anthropic support this. Use asyncio.gather.
Conditional: If weather query → get_weather. If stocks → get_stock_price. Routing logic or curated subset.
Iterative: Search returns too many hits → model narrows query → search again. Loop with iteration limits.
Fallback: Primary API times out → try backup or cached result. Try/except + fallback registration.
sequenceDiagram
participant User
participant App
participant Model
participant Tool
User->>App: "What's the weather in Paris?"
App->>Model: Prompt + tools list
Model->>App: tool_call get_weather "Paris"
App->>Tool: Execute get_weather Paris
Tool->>App: temp 22 celsius
App->>Model: tool_result
Model->>App: "The weather in Paris is 22C."
App->>User: Final responseWhy This Matters in Production: Your email booking agent: extract from email (tool 1) → RAG lookup for sailing options (tool 2) → create booking (tool 3) → human-in-the-loop confirmation. Sequential with a branch. Get the order wrong and you create a booking before you have the right sailing.
Aha Moment: Parallel doesn't mean independent. Sometimes tools have implicit dependencies. Model usually sequences correctly, but if it doesn't, you need ordering logic.
7. Secure Tool Execution
Interview Insight: Security questions separate seniors from juniors. Sandboxing, validation, least privilege, confirmation for destructive ops.
Sandboxing: Run tool code in containers or VMs. A buggy or malicious tool shouldn't compromise the host.
Input validation: Validate before execution. Types, ranges, formats. Sanitize for injection (SQL, shell). Pydantic or similar. Reject invalid before it hits the implementation.
Output validation: Filter PII before passing to the LLM. Validate format. Truncate huge payloads. Error structures → consistent format for the model to interpret.
Destructive operations: Two-step flow. Tool returns "This will delete X. Confirm?" Separate confirmation tool or user approval before execution.
Rate limiting: Per user, per session, per tool. Runaway agents can make thousands of calls.
Least privilege: Summarization agent doesn't need delete_database. Dynamic tool loading, permission systems.
Why This Matters in Production: On an enterprise platform with guardrails and policies, tool access is gated. Booking creation might require approval workflows. Deleting records = extra confirmation. You're not just securing the model—you're enforcing business rules.
Aha Moment: Tool descriptions from external MCP servers are untrusted. A malicious server could describe "get_weather" but execute "rm -rf". Users must understand what they're authorizing.
8. Error Handling in Tool Calls
Interview Insight: "What happens when the tool fails?" They want retries, timeouts, fallbacks, and what you tell the model.
Retries + exponential backoff: Transient failures often succeed on retry. 1s, 2s, 4s. Max retry count.
Timeouts: Hard timeouts so a stuck tool doesn't block forever. Document expected latency in tool descriptions.
Fallback: Backup API, cached result, graceful degradation. "I couldn't fetch live data; here's what I know."
Error messages to the LLM: Descriptive. "API rate limit exceeded; try again in 60 seconds" beats "Error 429." Model can retry, suggest waiting, or try a different approach.
Circuit breaker: N consecutive failures → stop calling temporarily. Cooldown. Prevents cascading when downstream is down.
flowchart TD
toolCall[Tool Call Request] --> execute{Execute Tool}
execute -->|Success| returnResult[Return Result to Model]
execute -->|Failure| retryCheck{Retry?}
retryCheck -->|Yes attempts left| backoff[Exponential Backoff]
backoff --> execute
retryCheck -->|No| fallbackCheck{Fallback Available?}
fallbackCheck -->|Yes| tryFallback[Try Fallback Tool]
tryFallback --> execute
fallbackCheck -->|No| returnError[Return Error to Model]
returnError --> modelAdjusts[Model Adjusts Strategy]Why This Matters in Production: Booking APIs flake. RAG services timeout. You need the agent to say "I couldn't complete the booking—please try again or contact support" instead of crashing. And you need observability (MLflow, logs) to know when failure rates spike.
Aha Moment: Return errors in a format the model can act on. Structure matters. "Error: timeout. Suggestion: retry in 30s or use manual booking link."
9. Tool Selection Strategies
Interview Insight: "The model keeps picking the wrong tool." Classic scaling problem. How do you fix it?
With 50+ tools, models get confused. Descriptions blur. Parameter schemas add cognitive load.
Solutions: (1) Dynamic tool selection—only pass relevant tools. (2) Better descriptions—distinct, explicit, "when to use" / "when not to use." (3) Consolidate overlapping tools—one search_docs not search and find. (4) Hierarchical routing—classifier picks category, then pass subset. (5) Few-shot examples in system prompt. (6) Fine-tuning for function calling (OpenAI supports this). (7) Monitor and iterate—log tool→query pairs, refine descriptions from misselection patterns.
Why This Matters in Production: On a platform with many agents and many tools, you need a tool registry that can filter by agent type, permission, and context. Not every agent sees every tool.
Aha Moment: OpenAI recommends <20 tools per request. If you have 50, you're doing it wrong—or you need a router in front.
Code Examples
OpenAI Function Calling: Complete Flow
from openai import OpenAI
import json
client = OpenAI()
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Retrieves current weather for the given location.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and country, e.g. Paris, France"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature units."}
},
"required": ["location", "units"],
"additionalProperties": False
},
"strict": True
}
]
def get_weather(location: str, units: str) -> dict:
# In production, call a real weather API
return {"temperature": 22, "conditions": "sunny", "unit": units}
messages = [{"role": "user", "content": "What's the weather in Paris in Celsius?"}]
response = client.chat.completions.create(model="gpt-4o", messages=messages, tools=tools, tool_choice="auto")
assistant_message = response.choices[0].message
if assistant_message.tool_calls:
messages.append(assistant_message)
for tool_call in assistant_message.tool_calls:
if tool_call.function.name == "get_weather":
args = json.loads(tool_call.function.arguments)
result = get_weather(args["location"], args["units"])
messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)})
final_response = client.chat.completions.create(model="gpt-4o", messages=messages, tools=tools)
print(final_response.choices[0].message.content)
else:
print(assistant_message.content)Anthropic Tool Use: Complete Flow
import anthropic
client = anthropic.Anthropic()
tools = [{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state, e.g. San Francisco, CA"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit"}
},
"required": ["location"]
}
}]
def get_weather(location: str, unit: str = "celsius") -> str:
return f"22°{unit[0].upper()}, sunny"
response = client.messages.create(
model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools,
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}]
)
while response.stop_reason == "tool_use":
tool_use_blocks = [b for b in response.content if b.type == "tool_use"]
tool_results = []
for block in tool_use_blocks:
if block.name == "get_weather":
result = get_weather(block.input["location"], block.input.get("unit", "celsius"))
tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result})
messages = list(response.content)
messages.append({"role": "user", "content": tool_results})
response = client.messages.create(
model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=messages
)
print(response.content[0].text)Building an MCP Server with FastMCP
from fastmcp import FastMCP
mcp = FastMCP("Weather Server")
@mcp.tool()
def get_weather(location: str, unit: str = "celsius") -> str:
"""Get the current weather for a location.
Args:
location: City and country, e.g. Paris, France
unit: Temperature unit: celsius or fahrenheit
"""
return f"Weather in {location}: 22°{unit[0].upper()}, sunny"
@mcp.tool()
def get_forecast(location: str, days: int = 3) -> str:
"""Get a multi-day forecast for a location."""
return f"Forecast for {location}: Sunny for the next {days} days"
if __name__ == "__main__":
mcp.run()MCP Client: Discover and Call Tools
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import asyncio
async def main():
server_params = StdioServerParameters(command="python", args=["weather_server.py"])
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
tools = await session.list_tools()
for tool in tools.tools:
print(f"Tool: {tool.name} - {tool.description}")
result = await session.call_tool("get_weather", {"location": "Paris, France", "unit": "celsius"})
print(result.content)
asyncio.run(main())Tool Error Handling with Retries
import time
from typing import Callable, Any
def with_retry(fn: Callable, max_retries: int = 3, base_delay: float = 1.0, backoff: float = 2.0) -> Callable:
def wrapper(*args, **kwargs) -> Any:
last_error = None
for attempt in range(max_retries):
try:
return fn(*args, **kwargs)
except Exception as e:
last_error = e
if attempt < max_retries - 1:
time.sleep(base_delay * (backoff ** attempt))
raise last_error
return wrapperInput Validation with Pydantic
from pydantic import BaseModel, Field
class GetWeatherInput(BaseModel):
location: str = Field(..., min_length=1, max_length=200)
unit: str = Field(default="celsius", pattern="^(celsius|fahrenheit)$")
def get_weather_validated(location: str, unit: str) -> dict:
try:
validated = GetWeatherInput(location=location, unit=unit)
except Exception as e:
return {"error": str(e), "suggestion": "Use format: city, country"}
return {"temperature": 22, "unit": validated.unit}Architecture: Tool Call Lifecycle
flowchart LR
userMsg[User Message] --> modelIn[Model Input]
modelIn --> modelOut[Model Output]
modelOut --> hasTools{Tool Calls?}
hasTools -->|Yes| execute[Execute Tools]
execute --> append[Append Results]
append --> modelIn
hasTools -->|No| final[Final Response]Conversational Interview Q&A
Q1: "How do you handle a tool call that fails or times out in production?"
Weak answer: "We catch the exception and return an error message to the user."
Strong answer: "Layered approach. Hard timeouts—10–30 seconds per tool—so a stuck call doesn't block the agent. Retries with exponential backoff for transient failures: 2–3 retries, 1s/2s/4s. I return structured error messages to the LLM—'Weather API timed out after 10s'—so the model can inform the user or try an alternative. For critical tools like booking creation, we have fallbacks: cached options or 'please try manual booking.' Circuit breakers after N consecutive failures so we don't hammer a down service. At Maersk, our email booking agent logs failures to MLflow—when the booking API starts failing, we see it immediately and can alert."
Q2: "Explain MCP architecture. How is it different from LangChain tools?"
Weak answer: "MCP is a protocol for tools. LangChain has tools too. They're similar."
Strong answer: "MCP has three layers: Host (Cursor, Claude Desktop), Client (connects to a server), Server (exposes tools, resources, prompts). Clients discover via
list_tools(), execute viacall_tool(). JSON-RPC over stdio or Streamable HTTP. LangChain tools are framework-specific—they work in the LangChain ecosystem. MCP is protocol-agnostic. Write a server once, any host that speaks MCP can use it. No lock-in. MCP also standardizes resources and prompts, which LangChain doesn't. On our Agent Platform at Maersk, we're evaluating MCP for standardizing tool access across agents—one server per capability area, pluggable into different runtimes."
Q3: "How do you validate tool outputs before passing them back to the LLM?"
Weak answer: "We check if it's valid JSON."
Strong answer: "Schema validation with Pydantic—structure and types. PII filtering: scan for emails, phone numbers, SSNs, redact before sending to the model. Size limits—truncate or summarize long outputs. Format normalization so the model gets consumable data. Error structures formatted consistently: 'Error: [message]. Suggestion: [action].' In compliance-heavy environments we log what was sent for audit. Our booking agent returns customer and sailing data—we filter any PII that shouldn't be in the prompt and cap response length so we stay within context limits."
Q4: "An agent has 50 tools. The model keeps picking the wrong one. How do you fix it?"
Weak answer: "Improve the descriptions."
Strong answer: "Dynamic tool loading first—only pass relevant tools. Use a lightweight classifier: is this a booking query? Calendar? Search? Pass 5–10 tools max. Second, make descriptions distinct: 'Use when the user wants to create a new booking' vs 'Use when the user wants to search existing bookings.' Consolidate overlapping tools—one search, not search and find. Hierarchical routing: first call classifies task, second gets only that category's tools. Few-shot examples in the system prompt. Monitor tool→query pairs, iterate on descriptions from misselections. At Maersk we have create_booking, search_sailings, get_quote—very different names, explicit 'when to use' so the model doesn't confuse them."
Q5: "How do you secure tool execution? What if a tool has destructive side effects?"
Weak answer: "We validate inputs and don't give delete access."
Strong answer: "Least privilege—agents get only tools they need. Input validation: schema, ranges, sanitization for injection. For destructive ops—DB deletes, file overwrites—two-step confirmation: tool returns preview, 'This will delete 50 records. Confirm?' Separate confirmation mechanism before execution. Sandbox high-risk tools in containers. Rate limit per user/session. Audit and log all invocations. Treat tool descriptions from external MCP servers as untrusted. On our platform we gate booking creation behind human-in-the-loop; destructive operations aren't even exposed to most agents."
Q6: "Walk through building an MCP server. What design decisions matter?"
Weak answer: "Use FastMCP, add tools with decorators, run it."
Strong answer: "Transport first: stdio for local (Cursor, CLI) or Streamable HTTP for remote (scalable, multi-client). Tool granularity: one per operation vs. one with a type parameter—finer gives more control, coarser reduces count. Schema design: JSON Schema, clear descriptions, enums for choices. Error handling: structured errors clients can pass to the LLM. Resources vs. tools: read-only data as resources (client-fetched), actions as tools (model-initiated). Auth for HTTP: API keys, OAuth. State: stateless vs. session state. At Maersk we'd build one server for booking tools, one for RAG—stdio for dev, HTTP for prod. Test with MCP inspector before production."
Q7: "How do you handle parallel tool calls? What challenges arise?"
Weak answer: "We run them in parallel with asyncio."
Strong answer: "
asyncio.gatherfor concurrent execution. Collect all results, return each with correctcall_idortool_use_id. Challenges: ordering—results must match request IDs; partial failure—return what succeeded, error for failures so model can adapt; per-tool timeouts so one hang doesn't block others; implicit dependencies—model usually sequences, but sometimes we need ordering logic; rate limits—parallel can hit faster, may need throttling. At Maersk our booking flow is mostly sequential by design, but for things like fetching sailing options and customer profile we could parallelize. Note: OpenAI saysparallel_tool_callsisn't supported with built-in/MCP tools in some configs."
From Your Experience (Maersk-Tailored Prompts)
1. Your platform provides "tools access" to agents. Walk through the architecture.
Describe: how tools are defined (schema format), how they're registered, how the agent runtime discovers and invokes them, how results flow back. Mention ToolRegistry or base abstractions. How do you support both OpenAI and Anthropic tool formats? How does tool access integrate with your guardrails and policies?
2. How does the email booking automation agent handle tool failures?
Your agent extracts info from emails, uses RAG, creates bookings, has human-in-the-loop. When create_booking fails (API down, validation error), what happens? Retries? Fallback to manual? Error message to user? Circuit breaker or alerting? How do you log and observe these in MLflow?
3. Would you use MCP for the Agent Platform, or a custom protocol? Why?
You have centralized LLM models, guardrails, tool access, observability. How would MCP fit? Stdio for local tools, HTTP for deployment? One server per capability vs. one monolithic server? What would a custom protocol need that MCP doesn't provide?
Quick Fire Round
Q: What does the model actually output when it wants to call a tool?
A: Structured JSON: function name + arguments. Not executable code.
Q: What's the difference between OpenAI's arguments and Anthropic's input?
A: OpenAI: JSON string. Anthropic: parsed object. You must parse OpenAI's.
Q: When should you use tool_choice: "required"?
A: When you want to force the model to use at least one tool—e.g., you need data before answering.
Q: What's MCP's main advantage over custom integrations?
A: Write once, plug anywhere. Protocol-agnostic. Any MCP client can use any MCP server.
Q: Stdio vs. Streamable HTTP for MCP?
A: Stdio: local, subprocess, lowest latency. HTTP: remote, scalable, auth, multi-client.
Q: How many tools should you pass per request?
A: ~10–15. With 50+, use dynamic loading or routing.
Q: What's the circuit breaker pattern for tools?
A: After N consecutive failures, stop calling for a cooldown. Re-enable later. Prevents cascading failures.
Q: Why are tool descriptions critical?
A: Model has no access to source code. Descriptions are the only way it knows when and how to use a tool.
Q: What's strict mode in OpenAI function calling?
A: strict: true enforces exact schema match. Required fields, additionalProperties: false. Recommended for production.
Q: How do you handle parallel tool calls when one fails?
A: Return results for successes, structured error for the failed one. Model can adapt. Don't block all on one failure.
Q: What are MCP resources vs. tools?
A: Resources: read-only, client fetches. Tools: model-initiated, callable functions. Different control flows.
Q: What's the two-step confirmation for destructive operations?
A: Tool returns preview ("This will delete X. Confirm?"). Separate confirmation mechanism before actual execution.
Q: What's allowed_tools in OpenAI (2025)?
A: Restrict model to a subset of tools while keeping full list for prompt caching.
Key Takeaways (Cheat Sheet)
| Topic | Key Point |
|---|---|
| Tool calling | Model requests; your code executes. Request–response protocol. |
| Schema design | Clear descriptions, fewer tools, distinct names. Descriptions = model's only interface. |
| OpenAI vs Anthropic | OpenAI: tool_calls, function_call_output, arguments as string. Anthropic: tool_use, tool_result in user content, input parsed. |
| MCP | Protocol-agnostic. Host–Client–Server. Tools, resources, prompts. Stdio or Streamable HTTP. |
| Security | Input/output validation, sandboxing, least privilege, confirmation for destructive ops, rate limiting. |
| Error handling | Retries + backoff, timeouts, fallbacks, descriptive errors to LLM, circuit breaker. |
| Tool selection | <20 tools per request. Dynamic loading, routing, better descriptions for scale. |
| Parallel calls | Execute concurrently, return each with correct ID. Handle partial failure, timeouts, rate limits. |
Further Reading
- OpenAI Function Calling — platform.openai.com/docs/guides/function-calling — The official guide. Straightforward; bookmark for schema and tool_choice details.
- Anthropic Tool Use — docs.anthropic.com/en/docs/build-with-claude/tool-use — Claude's take. Good for input_schema and tool_result structure.
- MCP Specification — modelcontextprotocol.io/specification — The source of truth. Architecture, transports, security.
- FastMCP — gofastmcp.com — Decorator-based Python MCP server. Fast way to build and test.
- Building Effective Agents (Anthropic) — Best practices for agent design, including when and how to expose tools.
- Agentic AI Security (2025) — Research on securing agents and tool execution—sandboxing, validation, consent.