Python Uvicorn and Gunicorn: ASGI/WSGI Production Deployment

Uvicorn is the go-to ASGI server for async Python frameworks like FastAPI and Starlette, while Gunicorn is the battle-tested WSGI process manager for Django and Flask. In production you typically combine them: Gunicorn manages multiple Uvicorn worker processes, giving you both async I/O performance and robust process supervision. This guide covers configuration, tuning, Docker deployment, Nginx integration, and graceful reload patterns.

Uvicorn Basics and Configuration

Uvicorn implements the ASGI spec and handles HTTP/1.1, HTTP/2, and WebSockets. For development, run it directly. For production, use it as a Gunicorn worker class rather than standalone, so Gunicorn handles process management.

pip install uvicorn[standard] gunicorn

# Development — single process, auto-reload on file changes
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Development with all options visible
uvicorn app.main:app \
  --reload \
  --host 0.0.0.0 \
  --port 8000 \
  --log-level debug \
  --access-log \
  --workers 1
# app/main.py — minimal FastAPI app
from fastapi import FastAPI
from contextlib import asynccontextmanager


@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: initialize DB pools, caches
    print("Starting up...")
    yield
    # Shutdown: close connections cleanly
    print("Shutting down...")


app = FastAPI(lifespan=lifespan)


@app.get("/health")
async def health():
    return {"status": "ok"}


@app.get("/")
async def root():
    return {"message": "Hello from Uvicorn"}
ASGI vs WSGI: WSGI (Gunicorn default) is synchronous — each worker handles one request at a time. ASGI (Uvicorn) is async — one worker handles thousands of concurrent connections using asyncio. Use ASGI for FastAPI/Starlette; WSGI for Django (unless using Django Channels) or Flask.

Gunicorn + Uvicorn Workers (Production)

Gunicorn acts as a process manager. It spawns multiple Uvicorn workers, monitors them, restarts crashed workers, and handles graceful reloads. Specify UvicornWorker as the worker class to run async code inside each worker process.

pip install uvicorn[standard] gunicorn

# Production: Gunicorn managing Uvicorn workers
gunicorn app.main:app \
  --worker-class uvicorn.workers.UvicornWorker \
  --workers 4 \
  --bind 0.0.0.0:8000 \
  --timeout 30 \
  --keepalive 5 \
  --max-requests 1000 \
  --max-requests-jitter 100 \
  --access-logfile - \
  --error-logfile - \
  --log-level info
# For Django (WSGI) — use Gunicorn directly without Uvicorn worker
# gunicorn myproject.wsgi:application --workers 4 --bind 0.0.0.0:8000

# For Django with async views (ASGI) — use Daphne or Uvicorn worker
# gunicorn myproject.asgi:application --worker-class uvicorn.workers.UvicornWorker

# Programmatic startup (useful for scripts)
import uvicorn

if __name__ == "__main__":
    uvicorn.run(
        "app.main:app",
        host="0.0.0.0",
        port=8000,
        workers=4,           # multi-process in production
        log_level="info",
        access_log=True,
        proxy_headers=True,  # trust X-Forwarded-For from Nginx
        forwarded_allow_ips="*",
    )

Worker Count and Tuning

The classic formula for WSGI sync workers is 2 * CPU cores + 1. For async Uvicorn workers, fewer workers suffice because each handles many concurrent connections — start with CPU cores + 1 and tune based on profiling. Memory usage grows linearly with worker count.

# Compute optimal worker count
import multiprocessing

cpu_count = multiprocessing.cpu_count()

# Async workers (FastAPI/Starlette via Uvicorn)
async_workers = cpu_count + 1

# Sync workers (Django/Flask via Gunicorn default worker)
sync_workers = (2 * cpu_count) + 1

print(f"CPU cores: {cpu_count}")
print(f"Recommended async workers: {async_workers}")
print(f"Recommended sync workers: {sync_workers}")

# Key timeout settings:
# --timeout 30       # kill worker if request exceeds 30s
# --graceful-timeout 20  # time for in-flight requests to finish on SIGTERM
# --keepalive 5      # keep HTTP connections alive for 5s (match Nginx upstream keepalive)
# --max-requests 1000  # restart worker after 1000 requests (prevent memory leaks)
# --max-requests-jitter 100  # randomise restart to avoid thundering herd

Gunicorn Config File

Use a Python config file instead of a long command line. Gunicorn imports it as a module, allowing dynamic configuration and server hooks for logging, worker lifecycle events, and metrics.

# gunicorn.conf.py
import multiprocessing

# Binding
bind = "0.0.0.0:8000"
backlog = 2048

# Workers
worker_class = "uvicorn.workers.UvicornWorker"
workers = multiprocessing.cpu_count() + 1
worker_connections = 1000
threads = 1            # async workers ignore this; set for sync workers only

# Timeouts
timeout = 30
graceful_timeout = 20
keepalive = 5

# Requests
max_requests = 1000
max_requests_jitter = 100

# Logging
accesslog = "-"          # stdout
errorlog = "-"           # stdout
loglevel = "info"
access_log_format = '{"remote":"%({X-Forwarded-For}i)s","method":"%(m)s","path":"%(U)s","status":"%(s)s","length":"%(b)s","referer":"%(f)s","agent":"%(a)s","time":"%(T)s"}'

# Process naming
proc_name = "techoral-api"

# Server hooks
def on_starting(server):
    server.log.info("Starting Gunicorn")

def worker_exit(server, worker):
    server.log.info(f"Worker {worker.pid} exited")

def post_fork(server, worker):
    # Called after each worker is forked — set up per-worker resources here
    server.log.info(f"Worker {worker.pid} booted")
# Start with config file
gunicorn app.main:app --config gunicorn.conf.py

Docker Deployment

Use a multi-stage Docker build to keep the final image small. The production image runs Gunicorn with Uvicorn workers, not the development Uvicorn reload server.

# Dockerfile
FROM python:3.12-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .

# Non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser /app
USER appuser

EXPOSE 8000

# Use JSON array form so SIGTERM reaches Gunicorn directly (not via shell)
CMD ["gunicorn", "app.main:app", \
     "--config", "gunicorn.conf.py"]
# docker-compose.yml
version: "3.9"
services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://user:pass@db:5432/mydb
      - REDIS_URL=redis://redis:6379/0
    depends_on:
      - db
      - redis
    deploy:
      resources:
        limits:
          cpus: "2"
          memory: 512M

  db:
    image: postgres:16
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass

  redis:
    image: redis:7-alpine

Nginx Reverse Proxy

Nginx sits in front of Gunicorn to handle SSL termination, static file serving, connection buffering, and load balancing across multiple Gunicorn instances. Always use Nginx in front of Gunicorn in production — never expose Gunicorn directly to the internet.

# /etc/nginx/sites-available/techoral-api
upstream api_backend {
    server 127.0.0.1:8000;
    keepalive 32;           # reuse connections to Gunicorn
}

server {
    listen 80;
    server_name api.techoral.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name api.techoral.com;

    ssl_certificate     /etc/letsencrypt/live/api.techoral.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.techoral.com/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    client_max_body_size 20M;

    location / {
        proxy_pass         http://api_backend;
        proxy_http_version 1.1;
        proxy_set_header   Connection "";
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;
        proxy_read_timeout 30s;
        proxy_connect_timeout 5s;
    }

    location /static/ {
        alias /app/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
    }
}

Graceful Reload and Zero-Downtime Deploy

Gunicorn supports zero-downtime reloads via SIGHUP. When it receives SIGHUP it spawns new workers with the updated code and waits for in-flight requests to complete in the old workers before killing them. This requires no downtime.

# Find Gunicorn master PID
cat /var/run/gunicorn.pid
# or
pgrep -f "gunicorn app.main:app"

# Graceful reload (spawns new workers with updated code)
kill -HUP $(cat /var/run/gunicorn.pid)

# Graceful shutdown (waits for in-flight requests)
kill -TERM $(cat /var/run/gunicorn.pid)

# Upgrade Gunicorn binary in place (USR2 + WINCH + QUIT)
kill -USR2 $(cat /var/run/gunicorn.pid)   # fork new master
kill -WINCH $(cat /var/run/gunicorn.pid)  # gracefully kill old workers
# Verify new master is running, then:
kill -QUIT $OLD_MASTER_PID
# systemd service for automatic restart
# /etc/systemd/system/techoral-api.service
[Unit]
Description=Techoral API (Gunicorn + Uvicorn)
After=network.target

[Service]
User=appuser
Group=appuser
WorkingDirectory=/app
ExecStart=/app/.venv/bin/gunicorn app.main:app --config gunicorn.conf.py --pid /var/run/gunicorn.pid
ExecReload=/bin/kill -s HUP $MAINPID
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Frequently Asked Questions

Should I use Uvicorn alone or Gunicorn + Uvicorn?
Use Gunicorn + UvicornWorker in production. Uvicorn standalone has no process supervision — if the process crashes, nothing restarts it. Gunicorn monitors workers, restarts them on crash, handles graceful reloads, and supports rolling restarts. In Kubernetes, you might run a single Uvicorn process per pod and let K8s handle restarts and scaling.
How many workers should I run?
For async Uvicorn workers: start with CPU_CORES + 1. Since each worker is non-blocking, you don't need as many as with sync workers. Watch memory and CPU under load — add workers if CPU is underutilised but latency is high, scale the pod horizontally if CPU is maxed.
Uvicorn vs Hypercorn vs Daphne?
Uvicorn is the fastest and most widely deployed ASGI server. Hypercorn supports HTTP/3 (QUIC) which Uvicorn does not. Daphne is the Django Channels-recommended server. For FastAPI and general async Python, Uvicorn is the standard choice.