Python OpenAI API: GPT Integration and Function Calling

The OpenAI Python SDK makes it straightforward to integrate GPT-4o, embeddings, and multimodal capabilities into your applications. Beyond basic chat completions, function calling lets you extract structured data and trigger actions, while structured outputs guarantee JSON-schema-conformant responses. This guide covers everything from your first API call to production patterns like retry logic, cost tracking, and async batch processing.

Setup and First Call

pip install openai tiktoken
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Simple completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a concise Python expert."},
        {"role": "user", "content": "What is a Python generator?"},
    ],
    max_tokens=500,
    temperature=0,
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Chat Completions

from openai import OpenAI

client = OpenAI()

def chat(messages: list[dict], model="gpt-4o-mini", **kwargs) -> str:
    resp = client.chat.completions.create(
        model=model,
        messages=messages,
        **kwargs,
    )
    return resp.choices[0].message.content

# Multi-turn conversation
history = [{"role": "system", "content": "You are a helpful assistant."}]

def ask(question: str) -> str:
    history.append({"role": "user", "content": question})
    answer = chat(history)
    history.append({"role": "assistant", "content": answer})
    return answer

print(ask("What is asyncio?"))
print(ask("How is it different from threading?"))
print(ask("When would I use multiprocessing instead?"))

# System prompt engineering
def code_review(code: str) -> str:
    return chat([
        {"role": "system", "content": (
            "You are a senior Python engineer. Review code for: "
            "correctness, security vulnerabilities, performance issues, "
            "and PEP 8 compliance. Be specific and actionable."
        )},
        {"role": "user", "content": f"Review this code:\n```python\n{code}\n```"},
    ], temperature=0)

Function Calling

Function calling lets the model decide when and how to call your Python functions. You describe functions in JSON schema; the model returns structured call arguments instead of text when it decides a function is needed.

import json
import requests

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "Search for products in the catalog",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "max_price": {"type": "number"},
                    "category": {"type": "string"},
                },
                "required": ["query"],
            },
        },
    },
]

def execute_tool(name: str, args: dict) -> str:
    if name == "get_weather":
        resp = requests.get(f"https://wttr.in/{args['location']}?format=3", timeout=5)
        return resp.text
    elif name == "search_products":
        return f"Found 5 products for '{args['query']}'"
    return "Unknown tool"

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        resp = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )
        msg = resp.choices[0].message

        if msg.tool_calls:
            messages.append(msg)  # assistant message with tool calls
            for tc in msg.tool_calls:
                result = execute_tool(tc.function.name, json.loads(tc.function.arguments))
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": result,
                })
        else:
            return msg.content  # final answer

answer = run_agent("What's the weather in Tokyo and do you have any rain jackets under $100?")

Structured Outputs

Structured outputs (introduced mid-2024) guarantee the model's response strictly conforms to your JSON schema. Use Pydantic models for clean integration.

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class ExtractedEntities(BaseModel):
    people: list[str]
    companies: list[str]
    locations: list[str]
    dates: list[str]

class SentimentAnalysis(BaseModel):
    sentiment: str  # "positive", "negative", "neutral"
    confidence: float
    key_phrases: list[str]
    summary: str

def extract_entities(text: str) -> ExtractedEntities:
    resp = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Extract named entities from the text."},
            {"role": "user", "content": text},
        ],
        response_format=ExtractedEntities,
    )
    return resp.choices[0].message.parsed

def analyze_sentiment(text: str) -> SentimentAnalysis:
    resp = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Analyze the sentiment of the text."},
            {"role": "user", "content": text},
        ],
        response_format=SentimentAnalysis,
        temperature=0,
    )
    return resp.choices[0].message.parsed

review = "The new MacBook Pro is absolutely incredible. Battery life is amazing, but the price is steep."
result = analyze_sentiment(review)
print(f"Sentiment: {result.sentiment} ({result.confidence:.0%})")

Streaming Responses

import asyncio
from openai import AsyncOpenAI

async_client = AsyncOpenAI()

# Sync streaming
def stream_response(prompt: str):
    with client.chat.completions.stream(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
    print()

# Async streaming for FastAPI
async def stream_async(prompt: str):
    async with async_client.chat.completions.stream(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    ) as stream:
        async for text in stream.text_stream:
            yield text

# FastAPI endpoint
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.post("/generate")
async def generate(prompt: str):
    async def gen():
        async for chunk in stream_async(prompt):
            yield f"data: {chunk}\n\n"
        yield "data: [DONE]\n\n"
    return StreamingResponse(gen(), media_type="text/event-stream")

Embeddings

import numpy as np

def get_embedding(text: str, model="text-embedding-3-small") -> list[float]:
    resp = client.embeddings.create(input=text, model=model)
    return resp.data[0].embedding

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a, b = np.array(a), np.array(b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

# Semantic search
docs = [
    "Python is a high-level programming language.",
    "FastAPI is an async web framework for Python.",
    "asyncio enables concurrent programming in Python.",
    "NumPy provides fast array operations.",
]
doc_embeddings = [get_embedding(d) for d in docs]

query = "How do I write async code?"
query_emb = get_embedding(query)

similarities = [(cosine_similarity(query_emb, de), doc) for de, doc in zip(doc_embeddings, docs)]
top = sorted(similarities, reverse=True)
for score, doc in top[:2]:
    print(f"{score:.3f}: {doc}")

# Batch embeddings (more efficient)
texts = ["text 1", "text 2", "text 3"]
resp = client.embeddings.create(input=texts, model="text-embedding-3-small")
embeddings = [r.embedding for r in resp.data]

Vision: Image Understanding

import base64
from pathlib import Path

def encode_image(path: str) -> str:
    return base64.b64encode(Path(path).read_bytes()).decode()

def describe_image(image_path: str) -> str:
    b64 = encode_image(image_path)
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in detail."},
                {"type": "image_url", "image_url": {
                    "url": f"data:image/jpeg;base64,{b64}",
                    "detail": "high",
                }},
            ],
        }],
        max_tokens=500,
    )
    return resp.choices[0].message.content

# From URL
def analyze_chart(url: str) -> dict:
    resp = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": "Extract data from this chart."},
                {"type": "image_url", "image_url": {"url": url}},
            ],
        }],
        response_format=ExtractedEntities,
    )
    return resp.choices[0].message.parsed

Production Patterns

from openai import OpenAI, RateLimitError, APIError
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type
import time

client = OpenAI(
    max_retries=3,      # built-in retry with exponential backoff
    timeout=30.0,
)

# Custom retry for application-level logic
@retry(
    retry=retry_if_exception_type(RateLimitError),
    wait=wait_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(5),
)
def safe_complete(messages: list, **kwargs) -> str:
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        **kwargs,
    ).choices[0].message.content

# Cost tracking
import tiktoken

def count_tokens(text: str, model="gpt-4o") -> int:
    enc = tiktoken.encoding_for_model(model)
    return len(enc.encode(text))

COST_PER_1K = {"gpt-4o": {"input": 0.005, "output": 0.015},
               "gpt-4o-mini": {"input": 0.00015, "output": 0.0006}}

def estimate_cost(input_tokens: int, output_tokens: int, model="gpt-4o") -> float:
    rates = COST_PER_1K.get(model, COST_PER_1K["gpt-4o"])
    return (input_tokens * rates["input"] + output_tokens * rates["output"]) / 1000

# Async batch processing
async def process_batch(items: list[str], concurrency: int = 5) -> list[str]:
    semaphore = asyncio.Semaphore(concurrency)
    async def process_one(item):
        async with semaphore:
            resp = await async_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": item}],
            )
            return resp.choices[0].message.content
    return await asyncio.gather(*[process_one(item) for item in items])

Frequently Asked Questions

gpt-4o vs gpt-4o-mini — which to choose?
Use gpt-4o-mini for classification, extraction, simple Q&A, and high-volume tasks — it is 15x cheaper and nearly as capable. Use gpt-4o for complex reasoning, code generation, multi-step agent tasks, and vision. Many production apps route tasks dynamically based on complexity.
How do I reduce API costs?
Cache repeated queries (Redis with the prompt as key). Use gpt-4o-mini for bulk tasks. Use the Batch API for non-real-time workloads (50% cheaper, async delivery). Shorten system prompts — every token costs money. Use structured outputs instead of asking the model to format JSON.
What is the difference between function calling and structured outputs?
Function calling lets the model decide when to call a function and provides arguments. Structured outputs guarantee the entire response conforms to a schema without the tool-call mechanism. Use function calling for agents that need to take actions; use structured outputs for always-JSON extraction tasks.