Python Uvicorn and Gunicorn: ASGI/WSGI Production Deployment
Uvicorn is the go-to ASGI server for async Python frameworks like FastAPI and Starlette, while Gunicorn is the battle-tested WSGI process manager for Django and Flask. In production you typically combine them: Gunicorn manages multiple Uvicorn worker processes, giving you both async I/O performance and robust process supervision. This guide covers configuration, tuning, Docker deployment, Nginx integration, and graceful reload patterns.
Table of Contents
Uvicorn Basics and Configuration
Uvicorn implements the ASGI spec and handles HTTP/1.1, HTTP/2, and WebSockets. For development, run it directly. For production, use it as a Gunicorn worker class rather than standalone, so Gunicorn handles process management.
pip install uvicorn[standard] gunicorn
# Development — single process, auto-reload on file changes
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Development with all options visible
uvicorn app.main:app \
--reload \
--host 0.0.0.0 \
--port 8000 \
--log-level debug \
--access-log \
--workers 1
# app/main.py — minimal FastAPI app
from fastapi import FastAPI
from contextlib import asynccontextmanager
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup: initialize DB pools, caches
print("Starting up...")
yield
# Shutdown: close connections cleanly
print("Shutting down...")
app = FastAPI(lifespan=lifespan)
@app.get("/health")
async def health():
return {"status": "ok"}
@app.get("/")
async def root():
return {"message": "Hello from Uvicorn"}
Gunicorn + Uvicorn Workers (Production)
Gunicorn acts as a process manager. It spawns multiple Uvicorn workers, monitors them, restarts crashed workers, and handles graceful reloads. Specify UvicornWorker as the worker class to run async code inside each worker process.
pip install uvicorn[standard] gunicorn
# Production: Gunicorn managing Uvicorn workers
gunicorn app.main:app \
--worker-class uvicorn.workers.UvicornWorker \
--workers 4 \
--bind 0.0.0.0:8000 \
--timeout 30 \
--keepalive 5 \
--max-requests 1000 \
--max-requests-jitter 100 \
--access-logfile - \
--error-logfile - \
--log-level info
# For Django (WSGI) — use Gunicorn directly without Uvicorn worker
# gunicorn myproject.wsgi:application --workers 4 --bind 0.0.0.0:8000
# For Django with async views (ASGI) — use Daphne or Uvicorn worker
# gunicorn myproject.asgi:application --worker-class uvicorn.workers.UvicornWorker
# Programmatic startup (useful for scripts)
import uvicorn
if __name__ == "__main__":
uvicorn.run(
"app.main:app",
host="0.0.0.0",
port=8000,
workers=4, # multi-process in production
log_level="info",
access_log=True,
proxy_headers=True, # trust X-Forwarded-For from Nginx
forwarded_allow_ips="*",
)
Worker Count and Tuning
The classic formula for WSGI sync workers is 2 * CPU cores + 1. For async Uvicorn workers, fewer workers suffice because each handles many concurrent connections — start with CPU cores + 1 and tune based on profiling. Memory usage grows linearly with worker count.
# Compute optimal worker count
import multiprocessing
cpu_count = multiprocessing.cpu_count()
# Async workers (FastAPI/Starlette via Uvicorn)
async_workers = cpu_count + 1
# Sync workers (Django/Flask via Gunicorn default worker)
sync_workers = (2 * cpu_count) + 1
print(f"CPU cores: {cpu_count}")
print(f"Recommended async workers: {async_workers}")
print(f"Recommended sync workers: {sync_workers}")
# Key timeout settings:
# --timeout 30 # kill worker if request exceeds 30s
# --graceful-timeout 20 # time for in-flight requests to finish on SIGTERM
# --keepalive 5 # keep HTTP connections alive for 5s (match Nginx upstream keepalive)
# --max-requests 1000 # restart worker after 1000 requests (prevent memory leaks)
# --max-requests-jitter 100 # randomise restart to avoid thundering herd
Gunicorn Config File
Use a Python config file instead of a long command line. Gunicorn imports it as a module, allowing dynamic configuration and server hooks for logging, worker lifecycle events, and metrics.
# gunicorn.conf.py
import multiprocessing
# Binding
bind = "0.0.0.0:8000"
backlog = 2048
# Workers
worker_class = "uvicorn.workers.UvicornWorker"
workers = multiprocessing.cpu_count() + 1
worker_connections = 1000
threads = 1 # async workers ignore this; set for sync workers only
# Timeouts
timeout = 30
graceful_timeout = 20
keepalive = 5
# Requests
max_requests = 1000
max_requests_jitter = 100
# Logging
accesslog = "-" # stdout
errorlog = "-" # stdout
loglevel = "info"
access_log_format = '{"remote":"%({X-Forwarded-For}i)s","method":"%(m)s","path":"%(U)s","status":"%(s)s","length":"%(b)s","referer":"%(f)s","agent":"%(a)s","time":"%(T)s"}'
# Process naming
proc_name = "techoral-api"
# Server hooks
def on_starting(server):
server.log.info("Starting Gunicorn")
def worker_exit(server, worker):
server.log.info(f"Worker {worker.pid} exited")
def post_fork(server, worker):
# Called after each worker is forked — set up per-worker resources here
server.log.info(f"Worker {worker.pid} booted")
# Start with config file
gunicorn app.main:app --config gunicorn.conf.py
Docker Deployment
Use a multi-stage Docker build to keep the final image small. The production image runs Gunicorn with Uvicorn workers, not the development Uvicorn reload server.
# Dockerfile
FROM python:3.12-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
# Non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser /app
USER appuser
EXPOSE 8000
# Use JSON array form so SIGTERM reaches Gunicorn directly (not via shell)
CMD ["gunicorn", "app.main:app", \
"--config", "gunicorn.conf.py"]
# docker-compose.yml
version: "3.9"
services:
api:
build: .
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/mydb
- REDIS_URL=redis://redis:6379/0
depends_on:
- db
- redis
deploy:
resources:
limits:
cpus: "2"
memory: 512M
db:
image: postgres:16
environment:
POSTGRES_DB: mydb
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
redis:
image: redis:7-alpine
Nginx Reverse Proxy
Nginx sits in front of Gunicorn to handle SSL termination, static file serving, connection buffering, and load balancing across multiple Gunicorn instances. Always use Nginx in front of Gunicorn in production — never expose Gunicorn directly to the internet.
# /etc/nginx/sites-available/techoral-api
upstream api_backend {
server 127.0.0.1:8000;
keepalive 32; # reuse connections to Gunicorn
}
server {
listen 80;
server_name api.techoral.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name api.techoral.com;
ssl_certificate /etc/letsencrypt/live/api.techoral.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.techoral.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
client_max_body_size 20M;
location / {
proxy_pass http://api_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 30s;
proxy_connect_timeout 5s;
}
location /static/ {
alias /app/static/;
expires 1y;
add_header Cache-Control "public, immutable";
}
}
Graceful Reload and Zero-Downtime Deploy
Gunicorn supports zero-downtime reloads via SIGHUP. When it receives SIGHUP it spawns new workers with the updated code and waits for in-flight requests to complete in the old workers before killing them. This requires no downtime.
# Find Gunicorn master PID
cat /var/run/gunicorn.pid
# or
pgrep -f "gunicorn app.main:app"
# Graceful reload (spawns new workers with updated code)
kill -HUP $(cat /var/run/gunicorn.pid)
# Graceful shutdown (waits for in-flight requests)
kill -TERM $(cat /var/run/gunicorn.pid)
# Upgrade Gunicorn binary in place (USR2 + WINCH + QUIT)
kill -USR2 $(cat /var/run/gunicorn.pid) # fork new master
kill -WINCH $(cat /var/run/gunicorn.pid) # gracefully kill old workers
# Verify new master is running, then:
kill -QUIT $OLD_MASTER_PID
# systemd service for automatic restart
# /etc/systemd/system/techoral-api.service
[Unit]
Description=Techoral API (Gunicorn + Uvicorn)
After=network.target
[Service]
User=appuser
Group=appuser
WorkingDirectory=/app
ExecStart=/app/.venv/bin/gunicorn app.main:app --config gunicorn.conf.py --pid /var/run/gunicorn.pid
ExecReload=/bin/kill -s HUP $MAINPID
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Frequently Asked Questions
- Should I use Uvicorn alone or Gunicorn + Uvicorn?
- Use Gunicorn + UvicornWorker in production. Uvicorn standalone has no process supervision — if the process crashes, nothing restarts it. Gunicorn monitors workers, restarts them on crash, handles graceful reloads, and supports rolling restarts. In Kubernetes, you might run a single Uvicorn process per pod and let K8s handle restarts and scaling.
- How many workers should I run?
- For async Uvicorn workers: start with
CPU_CORES + 1. Since each worker is non-blocking, you don't need as many as with sync workers. Watch memory and CPU under load — add workers if CPU is underutilised but latency is high, scale the pod horizontally if CPU is maxed. - Uvicorn vs Hypercorn vs Daphne?
- Uvicorn is the fastest and most widely deployed ASGI server. Hypercorn supports HTTP/3 (QUIC) which Uvicorn does not. Daphne is the Django Channels-recommended server. For FastAPI and general async Python, Uvicorn is the standard choice.