Python OpenAI API: GPT Integration and Function Calling

The OpenAI Python SDK provides direct access to GPT-4o, o1, DALL-E, Whisper, and embeddings — without a framework layer. Using the SDK directly gives you full control over request parameters, token counting, streaming, and error handling, and it is the right choice when you need low-level flexibility or want to avoid framework abstractions. The v1.x SDK (released 2023) introduced a cleaner synchronous and async client API with Pydantic response models.

This guide covers chat completions with system prompts, function calling (tool use), structured JSON output with Pydantic, vision (image understanding), embeddings for semantic search, async batch processing, and production patterns including rate limit handling and cost tracking. See Python LangChain Guide for framework-based orchestration of these primitives.

Installation and Client Setup
Chat Completions
Function Calling and Tool Use
Structured JSON Output
Vision: Image Understanding
Embeddings and Semantic Search
Async and Batch Processing

Installation and Client Setup

Install the official SDK and configure it with your API key. The client is thread-safe and can be shared across your application. Configure retries, timeouts, and base URL (for Azure OpenAI or compatible providers like Groq) at client creation time rather than per-request.

pip install openai tiktoken  # tiktoken for token counting
export OPENAI_API_KEY="sk-..."

from openai import OpenAI, AsyncOpenAI
import os

# Synchronous client — thread-safe, reuse across app
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    timeout=30.0,
    max_retries=3,  # Automatic retry with exponential backoff
)

# Async client for high-throughput applications
async_client = AsyncOpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    timeout=30.0,
    max_retries=3,
)

# Azure OpenAI
from openai import AzureOpenAI
azure_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_KEY"],
    azure_endpoint="https://your-resource.openai.azure.com",
    api_version="2024-02-01",
)

Chat Completions

The Chat Completions API is the primary interface for GPT models. Messages are passed as a list with roles — system, user, and assistant. The system message sets behaviour; user/assistant pairs represent conversation turns. Key parameters control output quality, length, and determinism.

from openai import OpenAI

client = OpenAI()

# Basic chat completion
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a Python expert. Be concise and practical."},
        {"role": "user", "content": "What is the difference between a list and a tuple?"},
    ],
    temperature=0.1,        # Near-deterministic for factual tasks
    max_tokens=500,
    top_p=1.0,
    frequency_penalty=0.0,
    presence_penalty=0.0,
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Cost estimate: ${response.usage.total_tokens * 0.00015 / 1000:.6f}")

# Multi-turn conversation
messages = [{"role": "system", "content": "You are a helpful coding assistant."}]

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0,
        max_tokens=1000,
    )
    assistant_msg = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_msg})
    return assistant_msg

print(chat("Write a Python function to reverse a string."))
print(chat("Now add type hints and a docstring."))
print(chat("Add unit tests for it."))

# Streaming
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain asyncio in 100 words."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

Function Calling and Tool Use

Function calling lets GPT decide when and how to call your Python functions based on the user's request. You define tools as JSON schemas; the model returns a structured call with arguments; you execute the function and return the result. This enables reliable structured data extraction and action-taking agents without prompt engineering hacks.

from openai import OpenAI
import json, requests

client = OpenAI()

# Define tools (functions the model can call)
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name, e.g. 'London'"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"},
                },
                "required": ["city"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Search the product database by keyword",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "limit": {"type": "integer", "default": 10},
                },
                "required": ["query"],
            },
        },
    },
]

def get_weather(city: str, units: str = "celsius") -> dict:
    return {"city": city, "temperature": 22, "conditions": "Partly cloudy", "units": units}

def search_database(query: str, limit: int = 10) -> list:
    return [{"id": 1, "name": f"Product matching '{query}'", "price": 29.99}]

def run_tool(name: str, args: dict):
    return {"get_weather": get_weather, "search_database": search_database}[name](**args)

# Agentic loop
messages = [{"role": "user", "content": "What's the weather in Paris and find me laptops?"}]

while True:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto",
    )
    msg = response.choices[0].message
    messages.append(msg)

    if response.choices[0].finish_reason == "tool_calls":
        for tool_call in msg.tool_calls:
            args = json.loads(tool_call.function.arguments)
            result = run_tool(tool_call.function.name, args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            })
    else:
        print(msg.content)
        break

Structured JSON Output

GPT-4o supports response_format={"type": "json_schema"} to guarantee valid JSON matching a schema. Combined with Pydantic, this gives you type-safe structured extraction — ideal for invoice parsing, data transformation, and entity extraction pipelines.

from openai import OpenAI
from pydantic import BaseModel
import json

client = OpenAI()

# Pydantic models for structured extraction
class JobPosting(BaseModel):
    title: str
    company: str
    location: str
    salary_min: int | None
    salary_max: int | None
    required_skills: list[str]
    experience_years: int | None
    remote: bool

# Parse job description into structured data
job_text = """
Senior Python Engineer at TechCorp (Remote-friendly, London HQ)
We're looking for 5+ years Python experience. Must know FastAPI, PostgreSQL,
Docker, and AWS. Salary: £80,000 - £110,000. Fully remote option available.
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Extract job posting information accurately."},
        {"role": "user", "content": f"Extract data from this job posting:\n\n{job_text}"},
    ],
    response_format={"type": "json_object"},
    temperature=0,
)

data = json.loads(response.choices[0].message.content)
job = JobPosting(**data)
print(f"Title: {job.title}")
print(f"Skills: {', '.join(job.required_skills)}")
print(f"Salary: £{job.salary_min:,} - £{job.salary_max:,}")
print(f"Remote: {job.remote}")

Vision: Image Understanding

GPT-4o accepts image inputs alongside text, enabling document OCR, chart analysis, UI screenshot review, and visual QA. Pass images as base64-encoded data URLs or public HTTPS URLs. The detail parameter controls token usage: "low" costs 85 tokens; "high" tiles the image for more detail.

from openai import OpenAI
import base64
from pathlib import Path

client = OpenAI()

# Image from URL
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image? Describe it in detail."},
            {"type": "image_url", "image_url": {
                "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Simple_English_Wikipedia_notext.svg/240px-Simple_English_Wikipedia_notext.svg.png",
                "detail": "low",
            }},
        ],
    }],
    max_tokens=300,
)
print(response.choices[0].message.content)

# Image from local file
def encode_image(path: str) -> str:
    return base64.b64encode(Path(path).read_bytes()).decode()

def analyse_image(image_path: str, question: str) -> str:
    b64 = encode_image(image_path)
    ext = Path(image_path).suffix.lstrip(".")
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": question},
                {"type": "image_url", "image_url": {
                    "url": f"data:image/{ext};base64,{b64}",
                    "detail": "high",
                }},
            ],
        }],
        max_tokens=500,
    )
    return response.choices[0].message.content

# Example: analyse a chart or screenshot
# answer = analyse_image("dashboard_screenshot.png", "What are the key metrics shown?")
# print(answer)

Embeddings and Semantic Search

OpenAI's text-embedding-3-small model converts text into dense vector representations. Similar texts produce similar vectors, enabling semantic search, clustering, deduplication, and recommendation without keyword matching. At $0.02 per million tokens, it is extremely cost-effective for large-scale embedding pipelines.

from openai import OpenAI
import numpy as np

client = OpenAI()

def embed(texts: list[str], model: str = "text-embedding-3-small") -> np.ndarray:
    """Embed a list of texts, returning an (N, D) float32 array."""
    response = client.embeddings.create(input=texts, model=model)
    return np.array([e.embedding for e in response.data], dtype=np.float32)

def cosine_similarity(a: np.ndarray, b: np.ndarray) -> np.ndarray:
    a_norm = a / np.linalg.norm(a, axis=1, keepdims=True)
    b_norm = b / np.linalg.norm(b, axis=1, keepdims=True)
    return a_norm @ b_norm.T

# Semantic search example
documents = [
    "Python asyncio enables concurrent I/O operations.",
    "FastAPI is a modern web framework for building APIs.",
    "NumPy provides efficient array operations for scientific computing.",
    "Docker containers package applications with their dependencies.",
    "Machine learning models learn patterns from training data.",
]

query = "How to do concurrent programming in Python?"

doc_embeddings = embed(documents)
query_embedding = embed([query])

similarities = cosine_similarity(query_embedding, doc_embeddings)[0]
ranked = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)

print(f"Query: {query}\n")
for rank, (idx, score) in enumerate(ranked[:3], 1):
    print(f"{rank}. [{score:.3f}] {documents[idx]}")

Async and Batch Processing

For high-throughput workloads — processing thousands of documents, batch classification, or parallel API calls — use the async client with asyncio.gather(). Rate limits apply per minute, so use a semaphore to cap concurrency. OpenAI's Batch API offers 50% cost reduction for non-time-sensitive workloads with 24-hour turnaround.

import asyncio
from openai import AsyncOpenAI

async_client = AsyncOpenAI()

async def classify(text: str, semaphore: asyncio.Semaphore) -> dict:
    async with semaphore:  # Limit concurrent requests
        response = await async_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Classify sentiment as positive, negative, or neutral. Reply with just the label."},
                {"role": "user", "content": text},
            ],
            temperature=0,
            max_tokens=10,
        )
        return {"text": text[:50], "sentiment": response.choices[0].message.content.strip()}

async def batch_classify(texts: list[str], max_concurrent: int = 10) -> list[dict]:
    semaphore = asyncio.Semaphore(max_concurrent)
    tasks = [classify(t, semaphore) for t in texts]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return [r for r in results if isinstance(r, dict)]

# Process 50 reviews concurrently
reviews = [f"Review number {i}: {'great' if i % 2 == 0 else 'terrible'} product!" for i in range(50)]
results = asyncio.run(batch_classify(reviews, max_concurrent=10))
for r in results[:5]:
    print(f"{r['sentiment']:10s} | {r['text']}")

Rate limits: GPT-4o-mini has a default limit of 500 RPM and 200,000 TPM on tier 1. Use a semaphore of 10-20 concurrent requests to stay under limits. Add tenacity-based retry logic for RateLimitError (HTTP 429) in production.