Function calling (also called tool use) lets LLMs invoke your application's code in a structured, reliable way. Instead of parsing natural language output with fragile string matching, the model returns a structured JSON object with a function name and arguments that you execute directly. This is the foundation of every reliable AI agent, and mastering it is essential for production LLM applications.
This guide covers function calling with both OpenAI (GPT-4o) and Anthropic (Claude), including single and parallel tool calls, the tool use loop, structured output extraction, error handling, and best practices for defining tools that models call reliably.
The function calling flow has three steps. First, you send the user message along with a list of tool definitions (JSON Schema descriptions of your functions) to the LLM. Second, if the model decides to call a tool, it returns a special tool_calls response (instead of a text response) containing the function name and arguments as JSON. Third, you execute the function, send the result back to the model, and it generates the final text response using the tool output as context.
Crucially, the model never executes code — it only returns a structured description of what to call and with what arguments. Your application code executes the actual function and controls what the model can do. This makes function calling safe: you have full control over every action taken.
import json
from openai import OpenAI
client = OpenAI()
# Step 1: Define your tools as JSON Schema
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city. Returns temperature and conditions.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'London' or 'New York'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit. Default celsius."
}
},
"required": ["city"]
}
}
}
]
# Step 2: First call — model may request tool use
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto" # Model decides whether to use tools
)
message = response.choices[0].message
print(f"Finish reason: {response.choices[0].finish_reason}") # "tool_calls"
print(f"Tool called: {message.tool_calls[0].function.name}") # "get_weather"
print(f"Args: {message.tool_calls[0].function.arguments}") # {"city": "Tokyo"}
tool_choice="required" to force the model to always call a tool, tool_choice="none" to prevent it, or tool_choice={"type": "function", "function": {"name": "get_weather"}} to force a specific function.
A complete tool use implementation executes in a loop: send message → check if tool calls requested → execute tools → send results → repeat until the model returns a final text response. This loop can run multiple iterations if the model needs to call several tools sequentially.
import json
from openai import OpenAI
client = OpenAI()
# Actual implementations of your tools
def get_weather(city: str, unit: str = "celsius") -> dict:
# In production, call a real weather API
return {"city": city, "temperature": 22, "unit": unit, "condition": "sunny"}
def search_database(query: str, limit: int = 5) -> list:
return [{"id": i, "result": f"Result {i} for: {query}"} for i in range(limit)]
TOOL_REGISTRY = {
"get_weather": get_weather,
"search_database": search_database,
}
def run_agent(user_message: str, tools: list, max_iterations: int = 5) -> str:
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o", messages=messages, tools=tools, tool_choice="auto"
)
choice = response.choices[0]
# No tool calls — final answer ready
if choice.finish_reason == "stop":
return choice.message.content
# Execute each requested tool call
messages.append(choice.message) # Append assistant message with tool_calls
for tool_call in choice.message.tool_calls:
fn_name = tool_call.function.name
fn_args = json.loads(tool_call.function.arguments)
result = TOOL_REGISTRY[fn_name](**fn_args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
return "Max iterations reached."
answer = run_agent("What's the weather in Paris?", tools)
print(answer)
GPT-4o and Claude 3+ both support parallel tool calls — where the model requests multiple tools simultaneously in a single response. This is a significant latency optimization: instead of weather→stocks→news taking 3 sequential round trips, all three are requested at once and you execute them in parallel.
import asyncio
import json
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def get_weather_async(city: str) -> dict:
await asyncio.sleep(0.1) # Simulate API call
return {"city": city, "temp": 22, "condition": "sunny"}
async def get_stock_price_async(ticker: str) -> dict:
await asyncio.sleep(0.1)
return {"ticker": ticker, "price": 185.42, "change": "+1.2%"}
async def run_parallel_tools(user_message: str) -> str:
tools = [
{"type": "function", "function": {
"name": "get_weather", "description": "Get weather for a city",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}},
{"type": "function", "function": {
"name": "get_stock_price", "description": "Get stock price for a ticker",
"parameters": {"type": "object", "properties": {"ticker": {"type": "string"}}, "required": ["ticker"]}
}},
]
messages = [{"role": "user", "content": user_message}]
response = await client.chat.completions.create(model="gpt-4o", messages=messages, tools=tools)
assistant_msg = response.choices[0].message
messages.append(assistant_msg)
if assistant_msg.tool_calls:
# Execute all tool calls IN PARALLEL with asyncio.gather
async def execute_tool(tc):
args = json.loads(tc.function.arguments)
if tc.function.name == "get_weather":
result = await get_weather_async(**args)
else:
result = await get_stock_price_async(**args)
return {"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)}
tool_results = await asyncio.gather(*[execute_tool(tc) for tc in assistant_msg.tool_calls])
messages.extend(tool_results)
final = await client.chat.completions.create(model="gpt-4o", messages=messages)
return final.choices[0].message.content
result = asyncio.run(run_parallel_tools("What's the weather in Tokyo and Apple's stock price?"))
print(result)
Anthropic's Claude uses a nearly identical tool use API with minor naming differences: tools instead of tools, tool_use content blocks instead of tool_calls, and tool_result user messages instead of tool role messages. Claude's tool use is especially reliable for complex multi-step reasoning.
import anthropic
import json
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a given city.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
]
messages = [{"role": "user", "content": "What's the weather in Berlin?"}]
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=tools,
messages=messages
)
# Check for tool use
if response.stop_reason == "tool_use":
tool_use_block = next(b for b in response.content if b.type == "tool_use")
fn_name = tool_use_block.name
fn_input = tool_use_block.input
# Execute the tool
result = {"city": fn_input["city"], "temp": 18, "condition": "cloudy"}
# Send result back to Claude
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{"type": "tool_result", "tool_use_id": tool_use_block.id,
"content": json.dumps(result)}]
})
final = client.messages.create(model="claude-opus-4-5", max_tokens=1024,
tools=tools, messages=messages)
print(final.content[0].text)
Function calling is the most reliable way to extract structured data from unstructured text. By defining a function with the exact schema you want, you guarantee the model returns valid JSON matching your schema — far more reliable than asking the model to "respond in JSON format".
from openai import OpenAI
from pydantic import BaseModel
import json
client = OpenAI()
# Use OpenAI's Structured Outputs feature (guaranteed schema adherence)
class JobPosting(BaseModel):
company: str
role: str
salary_min: int | None
salary_max: int | None
required_skills: list[str]
remote: bool
location: str | None
job_text = """
Senior ML Engineer at DataCorp (San Francisco or Remote)
Salary: $160k-$220k
Must have: Python, PyTorch, MLflow, 5+ years ML experience
Nice to have: Kubernetes, Rust
"""
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": "Extract structured job posting data."},
{"role": "user", "content": job_text}
],
response_format=JobPosting,
)
job = response.choices[0].message.parsed
print(f"Role: {job.role} at {job.company}")
print(f"Salary: ${job.salary_min:,} - ${job.salary_max:,}")
print(f"Skills: {job.required_skills}")
print(f"Remote: {job.remote}")
response_format with a Pydantic model uses Structured Outputs — a guarantee that the JSON will always match your schema exactly, with no parsing errors. Available on gpt-4o and gpt-4o-mini.
Write descriptions as if for a junior developer. The model selects tools based entirely on the description field. Be specific: "Get the current weather conditions and temperature for any city worldwide" beats "Get weather". Include what the tool does NOT do to avoid misuse.
Use enum for constrained choices. Wherever possible, use "enum": ["option1", "option2"] instead of free-form strings. This prevents the model from hallucinating invalid values.
Mark required fields explicitly. Always populate the "required" array with fields the function cannot proceed without. Optional fields with defaults should be omitted from required.
Return rich context, not bare values. Return {"city": "Tokyo", "temperature": 22, "unit": "celsius", "conditions": "sunny"} rather than just 22. The model needs context to form a good answer.
Validate before executing. Never trust function arguments blindly. Validate inputs, sanitize SQL-bound values, check permissions, and catch exceptions before they reach critical systems. Always return an error object (not raise an exception) so the model can recover gracefully.
def safe_database_query(table: str, filters: dict, limit: int = 10) -> dict:
"""Execute a safe read-only database query with validation."""
ALLOWED_TABLES = {"users", "products", "orders"}
if table not in ALLOWED_TABLES:
return {"error": f"Table '{table}' not allowed. Use: {ALLOWED_TABLES}"}
if limit > 100:
return {"error": "Limit cannot exceed 100 rows."}
try:
# Execute safe parameterized query
results = db.execute(f"SELECT * FROM {table} WHERE ... LIMIT ?", [limit])
return {"rows": results, "count": len(results)}
except Exception as e:
return {"error": f"Query failed: {str(e)}"}