Python Dataclasses: Replacing Boilerplate with @dataclass
What Are Dataclasses and Why Should You Use Them?
Before Python 3.7, creating a simple data-holding class required a lot of repetitive code. You wrote
__init__ to accept and assign every field, __repr__ so the object printed
something useful, and __eq__ so two objects with identical field values compared equal.
For a five-field class that means 20+ lines of pure boilerplate — and every time you added a new field,
you had to update three or four methods by hand.
Dataclasses, introduced in PEP 557
and available from Python 3.7+, solve this by generating those dunder methods automatically from a
simple class-level annotation syntax. The @dataclass decorator inspects the annotated
fields and generates __init__, __repr__, and __eq__ at class
creation time. No metaclass magic, no external library required — it is part of the standard library.
@dataclass, and it remains fully type-annotated,
IDE-friendly, and easy to extend.
The Plain-Class Problem
Here is a typical pre-3.7 data class and its equivalent dataclass:
# Old way — lots of manual boilerplate
class Product:
def __init__(self, name: str, price: float, stock: int = 0):
self.name = name
self.price = price
self.stock = stock
def __repr__(self):
return f"Product(name={self.name!r}, price={self.price!r}, stock={self.stock!r})"
def __eq__(self, other):
if not isinstance(other, Product):
return NotImplemented
return (self.name, self.price, self.stock) == (other.name, other.price, other.stock)
# New way — @dataclass does the heavy lifting
from dataclasses import dataclass
@dataclass
class Product:
name: str
price: float
stock: int = 0
# Usage is identical
p1 = Product("Widget", 9.99, 100)
p2 = Product("Widget", 9.99, 100)
print(p1) # Product(name='Widget', price=9.99, stock=100)
print(p1 == p2) # True — __eq__ generated automatically
The dataclass version is not just shorter — it is safer. When you add a new field, the
generated methods update automatically. There is no risk of forgetting to include a new field in
__repr__ or __eq__.
Basic @dataclass Usage with Type Hints
Every field in a dataclass is declared as a class-level annotation. Fields without defaults must come before fields with defaults — the same rule that applies to function parameters. The decorator accepts several boolean flags that control which methods are generated.
from dataclasses import dataclass, field
from typing import ClassVar
@dataclass(order=True, repr=True, eq=True)
class Employee:
# Fields without defaults — must come first
employee_id: int
name: str
department: str
# Fields with defaults
salary: float = 50_000.0
is_active: bool = True
# Class variable — NOT a dataclass field (excluded from __init__)
company: ClassVar[str] = "Techoral Inc."
def annual_bonus(self) -> float:
return self.salary * 0.1 if self.is_active else 0.0
e1 = Employee(101, "Alice", "Engineering", salary=95_000)
e2 = Employee(102, "Bob", "Marketing")
print(e1)
# Employee(employee_id=101, name='Alice', department='Engineering', salary=95000.0, is_active=True)
print(e1 > e2) # True — order=True generates __lt__, __le__, __gt__, __ge__
# Comparison uses a tuple of all fields in declaration order
print(Employee.company) # "Techoral Inc." — ClassVar not in __init__ or __repr__
print(e1.annual_bonus()) # 9500.0
init=True— generate__init__(default True)repr=True— generate__repr__(default True)eq=True— generate__eq__(default True)order=False— generate comparison methods (default False)frozen=False— make instances immutable (default False)slots=False— use__slots__for memory savings (Python 3.10+, default False)kw_only=False— force all fields to be keyword-only (Python 3.10+)
The field() Function: Fine-Grained Field Control
When you need more than a simple default value — mutable defaults, custom repr behaviour, metadata,
or exclusion from comparison — you reach for field() from the dataclasses
module.
from dataclasses import dataclass, field
from typing import List
@dataclass
class ShoppingCart:
owner: str
# Mutable default — NEVER use `items: list = []` in a dataclass (shared state bug)
# Use default_factory instead
items: List[str] = field(default_factory=list)
# Hidden from repr (e.g. internal cache or sensitive data)
_cache: dict = field(default_factory=dict, repr=False, compare=False, hash=False)
# Excluded from __init__ entirely — set in __post_init__
item_count: int = field(init=False, repr=True, compare=False)
# Metadata: arbitrary key-value pairs attached to the field descriptor
discount: float = field(default=0.0, metadata={"unit": "fraction", "min": 0, "max": 1})
def __post_init__(self):
# Runs after __init__ — good place to derive computed fields
self.item_count = len(self.items)
def add_item(self, item: str):
self.items.append(item)
self.item_count = len(self.items)
cart = ShoppingCart(owner="Alice", items=["book", "pen"], discount=0.1)
print(cart)
# ShoppingCart(owner='Alice', items=['book', 'pen'], item_count=2, discount=0.1)
# Access field metadata
from dataclasses import fields
discount_field = next(f for f in fields(cart) if f.name == "discount")
print(discount_field.metadata) # {'unit': 'fraction', 'min': 0, 'max': 1}
The field() parameters you will use most often:
default/default_factory— supply a value or a zero-argument callablerepr=False— omit the field from__repr__output (good for passwords, caches)compare=False/hash=False— exclude from equality and hashinginit=False— the field is not accepted in__init__; set it in__post_init__metadata— a read-only mapping for schema annotations, units, validation hints
items: list = [] in a dataclass. Python
evaluates that default once at class definition time, so every instance shares the same list object.
field(default_factory=list) creates a fresh list for every instance.
__post_init__: Validation and Derived Fields
The generated __init__ calls __post_init__ at the end if it is defined.
This is the standard hook for input validation, derived-field computation, and any setup that depends
on more than one field being available.
from dataclasses import dataclass, field
from datetime import date
@dataclass
class DateRange:
start: date
end: date
# Derived field — not passed in __init__
duration_days: int = field(init=False)
def __post_init__(self):
# Validation
if self.end < self.start:
raise ValueError(
f"end date {self.end} must be >= start date {self.start}"
)
# Derived computation
self.duration_days = (self.end - self.start).days
def overlaps(self, other: "DateRange") -> bool:
return self.start <= other.end and other.start <= self.end
sprint = DateRange(date(2026, 6, 1), date(2026, 6, 14))
print(sprint)
# DateRange(start=datetime.date(2026, 6, 1), end=datetime.date(2026, 6, 14), duration_days=13)
print(sprint.duration_days) # 13
try:
bad = DateRange(date(2026, 6, 14), date(2026, 6, 1))
except ValueError as e:
print(e) # end date 2026-06-01 must be >= start date 2026-06-14
__post_init__ but you
do not want it stored as a field, use InitVar[T]. It appears in __init__
but is passed directly to __post_init__ and then discarded.
from dataclasses import dataclass, InitVar
@dataclass
class HashedPassword:
username: str
raw_password: InitVar[str] # accepted in __init__, not stored
password_hash: str = field(init=False)
def __post_init__(self, raw_password: str):
import hashlib
self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()
u = HashedPassword("alice", "s3cr3t")
print(u.password_hash) # sha256 hex digest — raw_password is gone
frozen=True: Immutable Dataclasses and Hashability
Setting frozen=True makes the generated __setattr__ and
__delattr__ raise FrozenInstanceError, giving you value-object semantics.
Frozen dataclasses also get a generated __hash__, which means they can be used as
dictionary keys or set members — very useful for caching, memoisation, and immutable configuration
objects.
from dataclasses import dataclass, replace
@dataclass(frozen=True, order=True)
class Point:
x: float
y: float
def distance_to_origin(self) -> float:
return (self.x ** 2 + self.y ** 2) ** 0.5
p1 = Point(3.0, 4.0)
p2 = Point(0.0, 0.0)
p3 = Point(3.0, 4.0)
# Hashable — can be used in sets and as dict keys
visited: set[Point] = {p1, p2}
print(p1 in visited) # True
print(p3 in visited) # True — same value, same hash
cache: dict[Point, float] = {p1: p1.distance_to_origin()}
print(cache[Point(3.0, 4.0)]) # 5.0
# Ordering works because order=True
print(sorted([Point(3, 4), Point(1, 2), Point(2, 3)]))
# [Point(x=1, y=2), Point(x=2, y=3), Point(x=3, y=4)]
# Cannot mutate a frozen instance
try:
p1.x = 99.0
except Exception as e:
print(type(e).__name__, e) # FrozenInstanceError cannot assign to field 'x'
# Use replace() to create a modified copy (like Haskell record update)
p4 = replace(p1, y=0.0)
print(p4) # Point(x=3.0, y=0.0)
print(p1) # Point(x=3.0, y=4.0) — original unchanged
slots=True (Python 3.10+): Memory and Performance
By default every Python object stores its attributes in a per-instance __dict__. That
dictionary has overhead: typically 200–300 bytes even when empty. If you are creating millions of
small dataclass instances (events, coordinates, records), slots=True switches to
__slots__ storage, which uses a fixed-size C struct layout. The result is dramatically
lower memory usage and slightly faster attribute access.
from dataclasses import dataclass
import sys
@dataclass
class RegularPoint:
x: float
y: float
z: float
@dataclass(slots=True)
class SlottedPoint:
x: float
y: float
z: float
rp = RegularPoint(1.0, 2.0, 3.0)
sp = SlottedPoint(1.0, 2.0, 3.0)
print(sys.getsizeof(rp)) # ~56 bytes on CPython 3.12 (object header)
print(sys.getsizeof(rp.__dict__)) # ~232 bytes — the hidden dict overhead
print(sys.getsizeof(sp)) # ~72 bytes total — no __dict__ at all
# Slotted instances do not have __dict__
print(hasattr(rp, "__dict__")) # True
print(hasattr(sp, "__dict__")) # False
# You cannot add arbitrary attributes to a slotted instance
try:
sp.w = 4.0
except AttributeError as e:
print(e) # 'SlottedPoint' object has no attribute 'w'
# Frozen + slots is the fastest combination for read-only value objects
@dataclass(frozen=True, slots=True)
class ImmutableVector:
x: float
y: float
z: float
def magnitude(self) -> float:
return (self.x**2 + self.y**2 + self.z**2) ** 0.5
# Benchmark: creating 1 million instances
import timeit
t_regular = timeit.timeit(lambda: RegularPoint(1.0, 2.0, 3.0), number=1_000_000)
t_slotted = timeit.timeit(lambda: SlottedPoint(1.0, 2.0, 3.0), number=1_000_000)
print(f"Regular: {t_regular:.3f}s Slotted: {t_slotted:.3f}s")
# Typical: Regular ~0.38s Slotted ~0.26s (~30% faster construction)
slots=True for the full benefit. A slotted child of a non-slotted parent still gets a
__dict__ from the parent.
Dataclass Inheritance: Field Ordering Rules and Pitfalls
Dataclasses support inheritance — the child class gets all parent fields first, then its own fields.
The generated __init__ respects this ordering. The classic pitfall is placing a field
with a default in the parent, then trying to add a no-default field in the child: Python will raise
a TypeError because it would create an __init__ where a defaulted parameter
precedes a required one.
from dataclasses import dataclass, field
@dataclass
class Entity:
id: int # required — no default
created_at: str = "2026-01-01" # has default
@dataclass
class User(Entity):
username: str = "" # has default (safe — no non-default after default)
email: str = ""
@dataclass
class AdminUser(User):
permissions: list = field(default_factory=list)
admin = AdminUser(id=1, username="alice", email="alice@example.com")
print(admin)
# AdminUser(id=1, created_at='2026-01-01', username='alice', email='alice@example.com', permissions=[])
# The pitfall: parent has default, child wants required field
@dataclass
class Base:
x: int = 0 # has default
# This would raise TypeError at class definition time:
# @dataclass
# class Child(Base):
# y: int # required after a default — FORBIDDEN
# Fix 1: give y a default too
@dataclass
class Child(Base):
y: int = 0
# Fix 2: use kw_only=True (Python 3.10+) on the child field
@dataclass
class ChildKwOnly(Base):
y: int = field(kw_only=True, default=0)
# Fix 3: use @dataclass(kw_only=True) on the child class
@dataclass(kw_only=True)
class ChildAllKw(Base):
y: int # now required but keyword-only, so no ordering conflict
c = ChildAllKw(x=1, y=2) # must pass both as keyword args
print(c) # ChildAllKw(x=1, y=2)
Dataclass Utility Functions: asdict(), astuple(), replace(), fields()
The dataclasses module ships four utility functions that make working with dataclass
instances in pipelines and serialisation code much easier.
from dataclasses import dataclass, field, asdict, astuple, replace, fields
from typing import List
@dataclass
class Address:
street: str
city: str
country: str = "IN"
@dataclass
class Person:
name: str
age: int
address: Address
tags: List[str] = field(default_factory=list)
person = Person(
name="Priya",
age=30,
address=Address("123 MG Road", "Mysore"),
tags=["developer", "python"]
)
# asdict() — deep recursive dict conversion; perfect for JSON serialisation
d = asdict(person)
print(d)
# {'name': 'Priya', 'age': 30,
# 'address': {'street': '123 MG Road', 'city': 'Mysore', 'country': 'IN'},
# 'tags': ['developer', 'python']}
import json
print(json.dumps(d, indent=2)) # ready for REST API responses
# astuple() — deep recursive tuple conversion; useful for DB row insertion
row = astuple(person)
print(row) # ('Priya', 30, ('123 MG Road', 'Mysore', 'IN'), ['developer', 'python'])
# replace() — immutable-style update; returns a new instance
older_priya = replace(person, age=31)
print(older_priya.age) # 31
print(person.age) # 30 — original intact
relocated = replace(person, address=replace(person.address, city="Bangalore"))
print(relocated.address) # Address(street='123 MG Road', city='Bangalore', country='IN')
# fields() — introspect field metadata at runtime
for f in fields(person):
print(f.name, f.type, f.default, f.metadata)
# name str MISSING {}
# age int MISSING {}
# address Address MISSING {}
# tags typing.List[str] MISSING {}
asdict() performs a deep copy. Nested dataclasses,
dicts, lists, and tuples are all recursively converted. If your dataclass holds a non-dataclass
object that is not a dict/list/tuple (e.g. a custom class or a NumPy array), it is copied as-is
using copy.deepcopy. For large arrays this can be slow — use a custom serialiser
instead.
Dataclass vs Pydantic vs NamedTuple: When to Use Each
Python offers several ways to create structured data objects. Choosing the right one depends on whether you need runtime validation, immutability, performance, or JSON interoperability.
| Feature | @dataclass | Pydantic BaseModel | NamedTuple |
|---|---|---|---|
| Python version | 3.7+ | Any (3rd-party) | 3.6+ |
| Runtime type validation | No (types are hints only) | Yes — coerces and validates | No |
| Immutable option | frozen=True | model_config frozen=True | Always immutable |
| Hashable | Only when frozen=True | Only when frozen | Yes (if all fields hashable) |
| JSON serialisation | asdict() + json.dumps | .model_dump_json() built-in | ._asdict() + json.dumps |
| Schema generation | No (use dataclasses-json) | Yes — JSON Schema, OpenAPI | No |
| Memory efficiency | slots=True for best perf | Heavier (Rust core in v2) | Very light (tuple base) |
| Inheritance | Full class inheritance | Full class inheritance | Limited (single-level) |
| Stdlib | Yes | No (pip install pydantic) | Yes (typing.NamedTuple) |
Decision Guide
- Use @dataclass when you want zero dependencies, standard-library simplicity, and mutable-by-default behaviour. Ideal for internal DTOs, configuration objects, and domain entities that do not cross API boundaries.
-
Use Pydantic when you need runtime validation with good error messages, automatic
type coercion, JSON Schema generation, or FastAPI integration. The right choice for API request/
response models, settings management (
pydantic-settings), and anywhere untrusted input is parsed. - Use NamedTuple when you want the lightest possible immutable record and tuple unpacking behaviour. Good for small value objects returned from functions, row types from database queries, and anywhere you already work with tuples.
from typing import NamedTuple
from dataclasses import dataclass
# from pydantic import BaseModel # uncomment if pydantic is installed
# NamedTuple — immutable, tuple semantics, very lightweight
class RGB(NamedTuple):
r: int
g: int
b: int = 0
red = RGB(255, 0)
print(red) # RGB(r=255, g=0, b=0)
print(red[0]) # 255 — tuple indexing works
r, g, b = red # tuple unpacking works
# Dataclass — mutable by default, richer API
@dataclass
class RGBMutable:
r: int
g: int
b: int = 0
colour = RGBMutable(255, 0)
colour.r = 200 # mutation allowed
print(colour) # RGBMutable(r=200, g=0, b=0)
Real-World Patterns: Config Objects, DTOs, Value Objects
Pattern 1: Application Configuration Object
A frozen, slotted dataclass makes an excellent application config holder. It is immutable once built, memory-efficient, and easily constructed from environment variables or a config file.
import os
from dataclasses import dataclass, field
@dataclass(frozen=True, slots=True)
class AppConfig:
db_host: str
db_port: int
db_name: str
debug: bool = False
allowed_origins: tuple = ("https://techoral.com",)
max_connections: int = 10
@classmethod
def from_env(cls) -> "AppConfig":
return cls(
db_host=os.environ.get("DB_HOST", "localhost"),
db_port=int(os.environ.get("DB_PORT", "5432")),
db_name=os.environ.get("DB_NAME", "appdb"),
debug=os.environ.get("DEBUG", "false").lower() == "true",
max_connections=int(os.environ.get("MAX_CONN", "10")),
)
config = AppConfig.from_env()
print(config.db_host) # "localhost" (or whatever DB_HOST env var says)
# config.db_host = "other" # raises FrozenInstanceError — config is safe from mutation
Pattern 2: Data Transfer Objects (DTOs)
DTOs carry data between application layers (controller → service → repository). Dataclasses are
ideal: no validation overhead needed for internal use, easy to serialise, and asdict()
makes JSON conversion trivial.
from dataclasses import dataclass, asdict
from datetime import datetime
@dataclass
class CreateArticleDTO:
title: str
content: str
author_id: int
tags: list
published_at: datetime = None
def to_dict(self) -> dict:
d = asdict(self)
# Convert datetime to ISO string for JSON serialisation
if d["published_at"] is not None:
d["published_at"] = self.published_at.isoformat()
return d
dto = CreateArticleDTO(
title="Python Dataclasses Guide",
content="...",
author_id=42,
tags=["python", "dataclasses"],
published_at=datetime(2026, 6, 6, 10, 0),
)
import json
print(json.dumps(dto.to_dict(), indent=2))
Pattern 3: Immutable Value Objects
Domain-Driven Design uses value objects for concepts like Money, Email, and
Coordinates — objects whose identity is defined entirely by their values. A frozen
dataclass with __post_init__ validation is a clean implementation.
from dataclasses import dataclass
import re
@dataclass(frozen=True)
class Email:
value: str
def __post_init__(self):
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
if not re.match(pattern, self.value):
raise ValueError(f"Invalid email address: {self.value!r}")
def domain(self) -> str:
return self.value.split("@")[1]
@dataclass(frozen=True)
class Money:
amount: float
currency: str = "INR"
def __post_init__(self):
if self.amount < 0:
raise ValueError("Money amount cannot be negative")
def __add__(self, other: "Money") -> "Money":
if self.currency != other.currency:
raise ValueError(f"Cannot add {self.currency} and {other.currency}")
return Money(self.amount + other.amount, self.currency)
def __str__(self) -> str:
return f"{self.currency} {self.amount:,.2f}"
e = Email("alice@techoral.com")
print(e.domain()) # techoral.com
price = Money(499.0)
tax = Money(89.82)
total = price + tax
print(total) # INR 588.82
# Value objects are usable as dict keys
pricing: dict[Money, str] = {Money(999.0): "Pro", Money(1999.0): "Enterprise"}
print(pricing[Money(999.0)]) # Pro
Tips, Gotchas, and Best Practices
-
Use
__slots__for hot paths. If a dataclass is instantiated millions of times (game entities, event records, ML feature rows), always addslots=True. The 20-30% speed improvement and halved memory usage add up quickly. -
Combine
frozen=Trueandorder=Truefreely. These flags are independent. Frozen gives you safety and hashability; order gives you sortability. - Do not mix mutable and immutable fields in frozen dataclasses. A frozen dataclass that holds a list is technically immutable in terms of references (you cannot reassign the list attribute) but the list itself is still mutable. Use tuples for truly immutable collections.
-
Prefer
kw_only=Truefor large dataclasses. When a class has six or more fields, forcing keyword-only arguments makes call sites self-documenting and eliminates argument-order bugs. - Dataclasses are not ORM models. Do not use them as direct database table representations — that is SQLAlchemy's job. Use them as clean, validation-free DTOs that carry data to and from the persistence layer.
@dataclass.