Python Dataclasses: Replacing Boilerplate with @dataclass

June 6, 2026 | 12 min read | Python

What Are Dataclasses and Why Should You Use Them?

Before Python 3.7, creating a simple data-holding class required a lot of repetitive code. You wrote __init__ to accept and assign every field, __repr__ so the object printed something useful, and __eq__ so two objects with identical field values compared equal. For a five-field class that means 20+ lines of pure boilerplate — and every time you added a new field, you had to update three or four methods by hand.

Dataclasses, introduced in PEP 557 and available from Python 3.7+, solve this by generating those dunder methods automatically from a simple class-level annotation syntax. The @dataclass decorator inspects the annotated fields and generates __init__, __repr__, and __eq__ at class creation time. No metaclass magic, no external library required — it is part of the standard library.

Key benefit at a glance: The same five-field class that used to take 20+ lines of boilerplate now takes 7 lines with @dataclass, and it remains fully type-annotated, IDE-friendly, and easy to extend.

The Plain-Class Problem

Here is a typical pre-3.7 data class and its equivalent dataclass:

# Old way — lots of manual boilerplate
class Product:
    def __init__(self, name: str, price: float, stock: int = 0):
        self.name = name
        self.price = price
        self.stock = stock

    def __repr__(self):
        return f"Product(name={self.name!r}, price={self.price!r}, stock={self.stock!r})"

    def __eq__(self, other):
        if not isinstance(other, Product):
            return NotImplemented
        return (self.name, self.price, self.stock) == (other.name, other.price, other.stock)


# New way — @dataclass does the heavy lifting
from dataclasses import dataclass

@dataclass
class Product:
    name: str
    price: float
    stock: int = 0


# Usage is identical
p1 = Product("Widget", 9.99, 100)
p2 = Product("Widget", 9.99, 100)
print(p1)           # Product(name='Widget', price=9.99, stock=100)
print(p1 == p2)     # True  — __eq__ generated automatically

The dataclass version is not just shorter — it is safer. When you add a new field, the generated methods update automatically. There is no risk of forgetting to include a new field in __repr__ or __eq__.

Basic @dataclass Usage with Type Hints

Every field in a dataclass is declared as a class-level annotation. Fields without defaults must come before fields with defaults — the same rule that applies to function parameters. The decorator accepts several boolean flags that control which methods are generated.

from dataclasses import dataclass, field
from typing import ClassVar

@dataclass(order=True, repr=True, eq=True)
class Employee:
    # Fields without defaults — must come first
    employee_id: int
    name: str
    department: str

    # Fields with defaults
    salary: float = 50_000.0
    is_active: bool = True

    # Class variable — NOT a dataclass field (excluded from __init__)
    company: ClassVar[str] = "Techoral Inc."

    def annual_bonus(self) -> float:
        return self.salary * 0.1 if self.is_active else 0.0


e1 = Employee(101, "Alice", "Engineering", salary=95_000)
e2 = Employee(102, "Bob", "Marketing")

print(e1)
# Employee(employee_id=101, name='Alice', department='Engineering', salary=95000.0, is_active=True)

print(e1 > e2)   # True — order=True generates __lt__, __le__, __gt__, __ge__
                 # Comparison uses a tuple of all fields in declaration order

print(Employee.company)  # "Techoral Inc." — ClassVar not in __init__ or __repr__
print(e1.annual_bonus())  # 9500.0

@dataclass flags reference:

init=True — generate __init__ (default True)
repr=True — generate __repr__ (default True)
eq=True — generate __eq__ (default True)
order=False — generate comparison methods (default False)
frozen=False — make instances immutable (default False)
slots=False — use __slots__ for memory savings (Python 3.10+, default False)
kw_only=False — force all fields to be keyword-only (Python 3.10+)

The field() Function: Fine-Grained Field Control

When you need more than a simple default value — mutable defaults, custom repr behaviour, metadata, or exclusion from comparison — you reach for field() from the dataclasses module.

from dataclasses import dataclass, field
from typing import List

@dataclass
class ShoppingCart:
    owner: str

    # Mutable default — NEVER use `items: list = []` in a dataclass (shared state bug)
    # Use default_factory instead
    items: List[str] = field(default_factory=list)

    # Hidden from repr (e.g. internal cache or sensitive data)
    _cache: dict = field(default_factory=dict, repr=False, compare=False, hash=False)

    # Excluded from __init__ entirely — set in __post_init__
    item_count: int = field(init=False, repr=True, compare=False)

    # Metadata: arbitrary key-value pairs attached to the field descriptor
    discount: float = field(default=0.0, metadata={"unit": "fraction", "min": 0, "max": 1})

    def __post_init__(self):
        # Runs after __init__ — good place to derive computed fields
        self.item_count = len(self.items)

    def add_item(self, item: str):
        self.items.append(item)
        self.item_count = len(self.items)


cart = ShoppingCart(owner="Alice", items=["book", "pen"], discount=0.1)
print(cart)
# ShoppingCart(owner='Alice', items=['book', 'pen'], item_count=2, discount=0.1)

# Access field metadata
from dataclasses import fields
discount_field = next(f for f in fields(cart) if f.name == "discount")
print(discount_field.metadata)  # {'unit': 'fraction', 'min': 0, 'max': 1}

The field() parameters you will use most often:

default / default_factory — supply a value or a zero-argument callable
repr=False — omit the field from __repr__ output (good for passwords, caches)
compare=False / hash=False — exclude from equality and hashing
init=False — the field is not accepted in __init__; set it in __post_init__
metadata — a read-only mapping for schema annotations, units, validation hints

Common pitfall: Never write items: list = [] in a dataclass. Python evaluates that default once at class definition time, so every instance shares the same list object. field(default_factory=list) creates a fresh list for every instance.

__post_init__: Validation and Derived Fields

The generated __init__ calls __post_init__ at the end if it is defined. This is the standard hook for input validation, derived-field computation, and any setup that depends on more than one field being available.

from dataclasses import dataclass, field
from datetime import date

@dataclass
class DateRange:
    start: date
    end: date
    # Derived field — not passed in __init__
    duration_days: int = field(init=False)

    def __post_init__(self):
        # Validation
        if self.end < self.start:
            raise ValueError(
                f"end date {self.end} must be >= start date {self.start}"
            )
        # Derived computation
        self.duration_days = (self.end - self.start).days

    def overlaps(self, other: "DateRange") -> bool:
        return self.start <= other.end and other.start <= self.end


sprint = DateRange(date(2026, 6, 1), date(2026, 6, 14))
print(sprint)
# DateRange(start=datetime.date(2026, 6, 1), end=datetime.date(2026, 6, 14), duration_days=13)
print(sprint.duration_days)  # 13

try:
    bad = DateRange(date(2026, 6, 14), date(2026, 6, 1))
except ValueError as e:
    print(e)  # end date 2026-06-01 must be >= start date 2026-06-14

InitVar: If you need a parameter available in __post_init__ but you do not want it stored as a field, use InitVar[T]. It appears in __init__ but is passed directly to __post_init__ and then discarded.

from dataclasses import dataclass, InitVar

@dataclass
class HashedPassword:
    username: str
    raw_password: InitVar[str]   # accepted in __init__, not stored
    password_hash: str = field(init=False)

    def __post_init__(self, raw_password: str):
        import hashlib
        self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()

u = HashedPassword("alice", "s3cr3t")
print(u.password_hash)   # sha256 hex digest — raw_password is gone

frozen=True: Immutable Dataclasses and Hashability

Setting frozen=True makes the generated __setattr__ and __delattr__ raise FrozenInstanceError, giving you value-object semantics. Frozen dataclasses also get a generated __hash__, which means they can be used as dictionary keys or set members — very useful for caching, memoisation, and immutable configuration objects.

from dataclasses import dataclass, replace

@dataclass(frozen=True, order=True)
class Point:
    x: float
    y: float

    def distance_to_origin(self) -> float:
        return (self.x ** 2 + self.y ** 2) ** 0.5


p1 = Point(3.0, 4.0)
p2 = Point(0.0, 0.0)
p3 = Point(3.0, 4.0)

# Hashable — can be used in sets and as dict keys
visited: set[Point] = {p1, p2}
print(p1 in visited)   # True
print(p3 in visited)   # True — same value, same hash

cache: dict[Point, float] = {p1: p1.distance_to_origin()}
print(cache[Point(3.0, 4.0)])  # 5.0

# Ordering works because order=True
print(sorted([Point(3, 4), Point(1, 2), Point(2, 3)]))
# [Point(x=1, y=2), Point(x=2, y=3), Point(x=3, y=4)]

# Cannot mutate a frozen instance
try:
    p1.x = 99.0
except Exception as e:
    print(type(e).__name__, e)  # FrozenInstanceError cannot assign to field 'x'

# Use replace() to create a modified copy (like Haskell record update)
p4 = replace(p1, y=0.0)
print(p4)   # Point(x=3.0, y=0.0)
print(p1)   # Point(x=3.0, y=4.0) — original unchanged

replace() is your "mutation" tool for frozen dataclasses. It returns a new instance with any fields you specify overridden, while all other fields are copied from the original. This pattern is popular in functional-style Python and makes change-tracking trivial.

slots=True (Python 3.10+): Memory and Performance

By default every Python object stores its attributes in a per-instance __dict__. That dictionary has overhead: typically 200–300 bytes even when empty. If you are creating millions of small dataclass instances (events, coordinates, records), slots=True switches to __slots__ storage, which uses a fixed-size C struct layout. The result is dramatically lower memory usage and slightly faster attribute access.

from dataclasses import dataclass
import sys

@dataclass
class RegularPoint:
    x: float
    y: float
    z: float

@dataclass(slots=True)
class SlottedPoint:
    x: float
    y: float
    z: float


rp = RegularPoint(1.0, 2.0, 3.0)
sp = SlottedPoint(1.0, 2.0, 3.0)

print(sys.getsizeof(rp))          # ~56 bytes on CPython 3.12 (object header)
print(sys.getsizeof(rp.__dict__)) # ~232 bytes — the hidden dict overhead
print(sys.getsizeof(sp))          # ~72 bytes total — no __dict__ at all

# Slotted instances do not have __dict__
print(hasattr(rp, "__dict__"))   # True
print(hasattr(sp, "__dict__"))   # False

# You cannot add arbitrary attributes to a slotted instance
try:
    sp.w = 4.0
except AttributeError as e:
    print(e)  # 'SlottedPoint' object has no attribute 'w'

# Frozen + slots is the fastest combination for read-only value objects
@dataclass(frozen=True, slots=True)
class ImmutableVector:
    x: float
    y: float
    z: float

    def magnitude(self) -> float:
        return (self.x**2 + self.y**2 + self.z**2) ** 0.5


# Benchmark: creating 1 million instances
import timeit
t_regular = timeit.timeit(lambda: RegularPoint(1.0, 2.0, 3.0), number=1_000_000)
t_slotted = timeit.timeit(lambda: SlottedPoint(1.0, 2.0, 3.0), number=1_000_000)
print(f"Regular: {t_regular:.3f}s  Slotted: {t_slotted:.3f}s")
# Typical: Regular ~0.38s  Slotted ~0.26s  (~30% faster construction)

Slots and inheritance: All classes in an inheritance chain must use slots=True for the full benefit. A slotted child of a non-slotted parent still gets a __dict__ from the parent.

Dataclass Inheritance: Field Ordering Rules and Pitfalls

Dataclasses support inheritance — the child class gets all parent fields first, then its own fields. The generated __init__ respects this ordering. The classic pitfall is placing a field with a default in the parent, then trying to add a no-default field in the child: Python will raise a TypeError because it would create an __init__ where a defaulted parameter precedes a required one.

from dataclasses import dataclass, field

@dataclass
class Entity:
    id: int                          # required — no default
    created_at: str = "2026-01-01"   # has default

@dataclass
class User(Entity):
    username: str = ""               # has default (safe — no non-default after default)
    email: str = ""

@dataclass
class AdminUser(User):
    permissions: list = field(default_factory=list)

admin = AdminUser(id=1, username="alice", email="alice@example.com")
print(admin)
# AdminUser(id=1, created_at='2026-01-01', username='alice', email='alice@example.com', permissions=[])

# The pitfall: parent has default, child wants required field
@dataclass
class Base:
    x: int = 0           # has default

# This would raise TypeError at class definition time:
# @dataclass
# class Child(Base):
#     y: int              # required after a default — FORBIDDEN

# Fix 1: give y a default too
@dataclass
class Child(Base):
    y: int = 0

# Fix 2: use kw_only=True (Python 3.10+) on the child field
@dataclass
class ChildKwOnly(Base):
    y: int = field(kw_only=True, default=0)

# Fix 3: use @dataclass(kw_only=True) on the child class
@dataclass(kw_only=True)
class ChildAllKw(Base):
    y: int    # now required but keyword-only, so no ordering conflict

c = ChildAllKw(x=1, y=2)   # must pass both as keyword args
print(c)                    # ChildAllKw(x=1, y=2)

Dataclass Utility Functions: asdict(), astuple(), replace(), fields()

The dataclasses module ships four utility functions that make working with dataclass instances in pipelines and serialisation code much easier.

from dataclasses import dataclass, field, asdict, astuple, replace, fields
from typing import List

@dataclass
class Address:
    street: str
    city: str
    country: str = "IN"

@dataclass
class Person:
    name: str
    age: int
    address: Address
    tags: List[str] = field(default_factory=list)


person = Person(
    name="Priya",
    age=30,
    address=Address("123 MG Road", "Mysore"),
    tags=["developer", "python"]
)

# asdict() — deep recursive dict conversion; perfect for JSON serialisation
d = asdict(person)
print(d)
# {'name': 'Priya', 'age': 30,
#  'address': {'street': '123 MG Road', 'city': 'Mysore', 'country': 'IN'},
#  'tags': ['developer', 'python']}

import json
print(json.dumps(d, indent=2))  # ready for REST API responses

# astuple() — deep recursive tuple conversion; useful for DB row insertion
row = astuple(person)
print(row)  # ('Priya', 30, ('123 MG Road', 'Mysore', 'IN'), ['developer', 'python'])

# replace() — immutable-style update; returns a new instance
older_priya = replace(person, age=31)
print(older_priya.age)   # 31
print(person.age)        # 30 — original intact

relocated = replace(person, address=replace(person.address, city="Bangalore"))
print(relocated.address) # Address(street='123 MG Road', city='Bangalore', country='IN')

# fields() — introspect field metadata at runtime
for f in fields(person):
    print(f.name, f.type, f.default, f.metadata)
# name   str   MISSING   {}
# age    int   MISSING   {}
# address Address MISSING {}
# tags   typing.List[str]   MISSING   {}

asdict() gotcha: asdict() performs a deep copy. Nested dataclasses, dicts, lists, and tuples are all recursively converted. If your dataclass holds a non-dataclass object that is not a dict/list/tuple (e.g. a custom class or a NumPy array), it is copied as-is using copy.deepcopy. For large arrays this can be slow — use a custom serialiser instead.

Dataclass vs Pydantic vs NamedTuple: When to Use Each

Python offers several ways to create structured data objects. Choosing the right one depends on whether you need runtime validation, immutability, performance, or JSON interoperability.

Feature	@dataclass	Pydantic BaseModel	NamedTuple
Python version	3.7+	Any (3rd-party)	3.6+
Runtime type validation	No (types are hints only)	Yes — coerces and validates	No
Immutable option	frozen=True	model_config frozen=True	Always immutable
Hashable	Only when frozen=True	Only when frozen	Yes (if all fields hashable)
JSON serialisation	asdict() + json.dumps	.model_dump_json() built-in	._asdict() + json.dumps
Schema generation	No (use dataclasses-json)	Yes — JSON Schema, OpenAPI	No
Memory efficiency	slots=True for best perf	Heavier (Rust core in v2)	Very light (tuple base)
Inheritance	Full class inheritance	Full class inheritance	Limited (single-level)
Stdlib	Yes	No (pip install pydantic)	Yes (typing.NamedTuple)

Decision Guide

Use @dataclass when you want zero dependencies, standard-library simplicity, and mutable-by-default behaviour. Ideal for internal DTOs, configuration objects, and domain entities that do not cross API boundaries.
Use Pydantic when you need runtime validation with good error messages, automatic type coercion, JSON Schema generation, or FastAPI integration. The right choice for API request/ response models, settings management (pydantic-settings), and anywhere untrusted input is parsed.
Use NamedTuple when you want the lightest possible immutable record and tuple unpacking behaviour. Good for small value objects returned from functions, row types from database queries, and anywhere you already work with tuples.

from typing import NamedTuple
from dataclasses import dataclass
# from pydantic import BaseModel  # uncomment if pydantic is installed

# NamedTuple — immutable, tuple semantics, very lightweight
class RGB(NamedTuple):
    r: int
    g: int
    b: int = 0

red = RGB(255, 0)
print(red)           # RGB(r=255, g=0, b=0)
print(red[0])        # 255 — tuple indexing works
r, g, b = red        # tuple unpacking works

# Dataclass — mutable by default, richer API
@dataclass
class RGBMutable:
    r: int
    g: int
    b: int = 0

colour = RGBMutable(255, 0)
colour.r = 200        # mutation allowed
print(colour)         # RGBMutable(r=200, g=0, b=0)

Real-World Patterns: Config Objects, DTOs, Value Objects

Pattern 1: Application Configuration Object

A frozen, slotted dataclass makes an excellent application config holder. It is immutable once built, memory-efficient, and easily constructed from environment variables or a config file.

import os
from dataclasses import dataclass, field

@dataclass(frozen=True, slots=True)
class AppConfig:
    db_host: str
    db_port: int
    db_name: str
    debug: bool = False
    allowed_origins: tuple = ("https://techoral.com",)
    max_connections: int = 10

    @classmethod
    def from_env(cls) -> "AppConfig":
        return cls(
            db_host=os.environ.get("DB_HOST", "localhost"),
            db_port=int(os.environ.get("DB_PORT", "5432")),
            db_name=os.environ.get("DB_NAME", "appdb"),
            debug=os.environ.get("DEBUG", "false").lower() == "true",
            max_connections=int(os.environ.get("MAX_CONN", "10")),
        )


config = AppConfig.from_env()
print(config.db_host)   # "localhost" (or whatever DB_HOST env var says)
# config.db_host = "other"  # raises FrozenInstanceError — config is safe from mutation

Pattern 2: Data Transfer Objects (DTOs)

DTOs carry data between application layers (controller ? service ? repository). Dataclasses are ideal: no validation overhead needed for internal use, easy to serialise, and asdict() makes JSON conversion trivial.

from dataclasses import dataclass, asdict
from datetime import datetime

@dataclass
class CreateArticleDTO:
    title: str
    content: str
    author_id: int
    tags: list
    published_at: datetime = None

    def to_dict(self) -> dict:
        d = asdict(self)
        # Convert datetime to ISO string for JSON serialisation
        if d["published_at"] is not None:
            d["published_at"] = self.published_at.isoformat()
        return d


dto = CreateArticleDTO(
    title="Python Dataclasses Guide",
    content="...",
    author_id=42,
    tags=["python", "dataclasses"],
    published_at=datetime(2026, 6, 6, 10, 0),
)
import json
print(json.dumps(dto.to_dict(), indent=2))

Pattern 3: Immutable Value Objects

Domain-Driven Design uses value objects for concepts like Money, Email, and Coordinates — objects whose identity is defined entirely by their values. A frozen dataclass with __post_init__ validation is a clean implementation.

from dataclasses import dataclass
import re

@dataclass(frozen=True)
class Email:
    value: str

    def __post_init__(self):
        pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
        if not re.match(pattern, self.value):
            raise ValueError(f"Invalid email address: {self.value!r}")

    def domain(self) -> str:
        return self.value.split("@")[1]


@dataclass(frozen=True)
class Money:
    amount: float
    currency: str = "INR"

    def __post_init__(self):
        if self.amount < 0:
            raise ValueError("Money amount cannot be negative")

    def __add__(self, other: "Money") -> "Money":
        if self.currency != other.currency:
            raise ValueError(f"Cannot add {self.currency} and {other.currency}")
        return Money(self.amount + other.amount, self.currency)

    def __str__(self) -> str:
        return f"{self.currency} {self.amount:,.2f}"


e = Email("alice@techoral.com")
print(e.domain())   # techoral.com

price = Money(499.0)
tax   = Money(89.82)
total = price + tax
print(total)   # INR 588.82

# Value objects are usable as dict keys
pricing: dict[Money, str] = {Money(999.0): "Pro", Money(1999.0): "Enterprise"}
print(pricing[Money(999.0)])  # Pro

Tips, Gotchas, and Best Practices

Use __slots__ for hot paths. If a dataclass is instantiated millions of times (game entities, event records, ML feature rows), always add slots=True. The 20-30% speed improvement and halved memory usage add up quickly.
Combine frozen=True and order=True freely. These flags are independent. Frozen gives you safety and hashability; order gives you sortability.
Do not mix mutable and immutable fields in frozen dataclasses. A frozen dataclass that holds a list is technically immutable in terms of references (you cannot reassign the list attribute) but the list itself is still mutable. Use tuples for truly immutable collections.
Prefer kw_only=True for large dataclasses. When a class has six or more fields, forcing keyword-only arguments makes call sites self-documenting and eliminates argument-order bugs.
Dataclasses are not ORM models. Do not use them as direct database table representations — that is SQLAlchemy's job. Use them as clean, validation-free DTOs that carry data to and from the persistence layer.

Python 3.12+ improvements: PEP 681 (dataclass_transform) makes dataclass-like decorators type-checker-aware, enabling libraries like Pydantic and attrs to get the same first-class IDE support as the stdlib @dataclass.