The Trap of Primitive Obsession Developers often default to Python primitives like floats or strings to represent complex domain concepts. While a float can hold a price, it cannot inherently ensure that the price is non-negative. This leads to "primitive obsession," where validation logic scatters throughout every function that touches the data. When you pass a simple float representing a discount, you lose context. Is 0.2 a flat 20% discount or 0.2 units of currency? By relying on primitives, you force your functions to defensively validate inputs every time, which doesn't scale and invites bugs. Defining the Value Object Originating from Domain-Driven Design, a Value Object is an immutable container that enforces its own invariants. Unlike entities, we compare value objects by their internal values rather than a unique identity. The core philosophy is simple: validate once upon creation. If an object exists, it is guaranteed to be valid. This shifts the burden of proof from the consuming function to the object itself. Implementation via Dataclasses The most straightforward way to implement this pattern is using dataclasses. By setting `frozen=True`, you ensure immutability. ```python from dataclasses import dataclass @dataclass(frozen=True) class Price: value: float def __post_init__(self): if self.value < 0: raise ValueError("Price must be non-negative") ``` In this model, the `__post_init__` method acts as a gatekeeper. Any attempt to instantiate a `Price` with a negative number immediately raises a `ValueError`. Inheriting from Primitives If accessing `.value` feels cumbersome, you can inherit directly from float. Since floats are already immutable, this preserves the value object's integrity while allowing the object to behave like a standard number. ```python class Percentage(float): def __new__(cls, value): if not 0 <= value <= 1: raise ValueError("Percentage must be between 0 and 1") return super().__new__(cls, value) ``` Using `__new__` allows you to validate the data before the object is even created. This allows you to perform math operations directly on the object without manually extracting the internal value. Practical Syntax Notes - **Immutability:** Always use `frozen=True` with dataclasses to prevent state changes after creation. - **Custom Creators:** Use `@classmethod` for alternative constructors, such as `from_percent(cls, value)` which might divide an integer by 100. - **Type Clarity:** Value objects act as documentation. A function signature requiring a `EmailAddress` type is far more descriptive than one requiring a `str`.
dataclasses
Libraries
ArjanCodes (3 mentions) emphasizes that dataclasses simplify class creation for data storage, noting familiarity with them is essential for structuring immutable events, as seen in "Stop Overwriting State And Use Event Sourcing Instead".
- Mar 13, 2026
- Mar 6, 2026
- Nov 21, 2025
- Nov 7, 2025
- May 19, 2023
Overview of the Iterator Protocol At its core, an iterator is a stateful object that lets you traverse a sequence of data one element at a time. This mechanism matters because it decouples the data’s storage from the logic used to consume it. Instead of loading an entire dataset into memory, Python iterators produce items on demand. This approach is highly memory-efficient, especially when dealing with massive datasets or infinite streams of data that would otherwise crash your system. Prerequisites Before diving into the implementation, you should have a firm grasp of: - Basic Python syntax and data structures (lists, tuples, and dictionaries). - The concept of loops and conditional logic. - Class definitions and dunder (double underscore) methods. Key Libraries & Tools - **itertools**: A built-in Python module that provides a suite of fast, memory-efficient tools for creating iterators for efficient looping. - **dataclasses**: Used for creating structured data objects that can be made immutable (frozen) for use in specific iterator patterns. Understanding the Iterable vs. Iterator Distinction People often use these terms interchangeably, but they represent different roles in the protocol. An **iterable** is an object capable of returning an iterator (like a list or tuple). An **iterator** is the actual object that tracks the current state of the traversal. ```python countries = ("Germany", "France", "Italy") Getting an iterator from an iterable country_iterator = iter(countries) print(next(country_iterator)) # Germany print(next(country_iterator)) # France ``` If you call `iter()` on an iterator, it simply returns itself. However, calling `iter()` on an iterable creates a brand-new iterator starting from the beginning. This subtle difference allows multiple independent traversals over the same data source simultaneously. Implementing Custom Iterators You can build your own traversal logic by implementing the `__iter__` and `__next__` methods within a class. This is particularly useful for generating sequences that don't exist in memory, such as a custom range or an infinite counter. ```python class NumberIterator: def __init__(self, maximum: int): self.number = 0 self.maximum = maximum def __iter__(self): return self def __next__(self): if self.number >= self.maximum: raise StopIteration self.number += 1 return self.number ``` Advanced Composition with Itertools The itertools package provides an "algebra of iterators." It allows you to chain, filter, and transform data streams without writing manual loops. This leads to cleaner, more declarative code. Chaining and Permutations ```python import itertools items = ['A', 'B'] more_items = ['C', 'D'] Combine sequences combined = itertools.chain(items, more_items) Find all pairs pairs = list(itertools.combinations(items + more_items, 2)) ``` Functional Transformations with Starmap `starmap` is a powerful alternative to standard mapping when your data is already grouped into tuples. It unpacks the arguments for you automatically. ```python data = [(2, 6), (8, 4), (5, 3)] Multiplies X * Y for each tuple totals = list(itertools.starmap(lambda x, y: x * y, data)) ``` Syntax Notes & Best Practices - **StopIteration**: Always raise this error in `__next__` to signal the end of the sequence. For loops handle this exception automatically. - **Frozen Dataclasses**: When iterating over sets of objects, ensure your dataclasses are `frozen=True` so they are hashable. - **Readability**: While you can chain multiple itertools functions, avoid "one-liners" that become impossible to debug. Break complex chains into intermediate variables with descriptive names. Tips & Gotchas Iterators are one-time use. Once you exhaust an iterator (by reaching the end), it is spent. If you need the data again, you must create a new iterator instance. A common mistake is trying to iterate over the same iterator variable twice and wondering why the second loop produces no output.
Jan 13, 2023Overview Software evolution often requires moving beyond code that simply works to code that is maintainable and scalable. In this second phase of refactoring a Rock Paper Scissors Lizard Spock game, the focus shifts from user interface separation to streamlining internal logic. By applying principles like the Law of Demeter and utilizing Python features like Dataclasses and Enums, we can transform a clunky, class-heavy architecture into a lean, functional design. This approach matters because it reduces cognitive load for developers and makes the codebase resilient to future changes. Prerequisites To follow this tutorial, you should have a solid grasp of Python 3.10 or later. Familiarity with Object-Oriented Programming (OOP) concepts, specifically classes and inheritance, is essential. You should also understand basic Data Structures like dictionaries and tuples, and have a surface-level understanding of Dependency Injection. Key Libraries & Tools * **Dataclasses**: A standard library module that reduces boilerplate by automatically generating special methods like `__init__` and `__repr__`. * **Enums**: Provides support for enumeration types, allowing for clearer and more type-safe constant definitions. * **Random**: Used for generating CPU moves through the `random.choice` method. * **Typing**: Utilized for providing hints like `Optional` to clarify function return types. Refactoring the Rules Engine The original code used a bulky `Rules` class to determine winners. We can simplify this by replacing the class with a constant dictionary and a single function. By mapping a tuple of moves to a verb (e.g., "crushes" or "vaporizes"), we eliminate redundant data storage. ```python RULES = { (Entity.ROCK, Entity.SCISSORS): "crushes", (Entity.SPOCK, Entity.ROCK): "vaporizes", # ... other rules } def get_winner(e1: Entity, e2: Entity) -> tuple[Entity | None, str]: if e1 == e2: return None, "It's a tie" if (e1, e2) in RULES: return e1, f"{e1} {RULES[(e1, e2)]} {e2}" # Logic for reverse check here ``` This functional approach moves the "tie" responsibility out of the main game loop and into the rule logic where it belongs. It also allows the use of modern Python type hinting (like `|` for union types) instead of importing `Union` or `Tuple` from the typing module. Implementing Dataclasses and Dependency Injection The `Game` class initializer was previously overloaded with setup logic. By converting it to a Dataclass, we define the state clearly and use Dependency Injection to pass in the `Scoreboard` and `UI` objects. This makes the code easier to test since we can now inject "mock" versions of these objects. ```python from dataclasses import dataclass @dataclass class Game: scoreboard: Scoreboard ui: UI player_name: str cpu_name: str = "CPU" def play(self, max_rounds: int = 5): # Logic moved from __init__ to here self.scoreboard.register_player(self.player_name) self.scoreboard.register_player(self.cpu_name) ``` Adhering to the Law of Demeter To avoid "reaching through" objects—such as the `Game` class directly modifying `scoreboard.points`—we implement helper methods. A `win_round` method in the `Scoreboard` class hides the implementation details of how scores are stored. Similarly, adding a `to_display` method to the `Scoreboard` ensures that the `Game` class doesn't need to know the internal structure of the points dictionary to show results. Syntax Notes: String-Based Enums While Auto integers are common in Enums, they are brittle if the order changes or if data is persisted in a database. Switching to string-based values provides more control over the output and improves readability when debugging. ```python class Entity(str, Enum): ROCK = "Rock" PAPER = "Paper" def __str__(self): return self.value ``` Tips & Gotchas * **The Underscore Pattern**: When iterating through a loop where the index isn't needed (like a round counter), use `_` to signal to other developers that the variable is intentionally unused. * **Input Validation**: When converting string Enums back to list indices for user selection, always subtract one from the user's input to align with zero-based indexing. * **Coupling Balance**: While moving the `display` logic into the `Scoreboard` avoids a Law of Demeter violation, it introduces a slight coupling between the `Scoreboard` and the `UI` protocol. This is usually an acceptable trade-off for cleaner high-level logic.
Apr 15, 2022Overview Software development often suffers from a phenomenon where developers use complex language features simply because they exist. In this tutorial, we analyze a Python script designed for academic paper scraping that fell into this trap. The original code used convoluted dunder method overrides to change how class initialization works, creating a maintenance nightmare. We will walk through the process of simplifying this architecture. The goal is to move away from rigid, deep inheritance and toward a functional, modular design. By leveraging Data Classes and Python Sets, we can create a codebase that is easier to read, faster to execute, and significantly simpler to test. Proper refactoring isn't just about making code look "clean"; it's about reducing the cognitive load required for the next developer to understand the logic. Prerequisites To get the most out of this guide, you should be comfortable with: * **Python Fundamentals**: Basic syntax, loops, and list comprehensions. * **Object-Oriented Programming (OOP)**: Understanding classes, instances, and inheritance. * **Type Hinting**: Familiarity with the `typing` module (`List`, `Tuple`, `Set`). * **Basic Tooling**: Experience using an IDE like VS Code or PyCharm. Key Libraries & Tools * Data Classes: A built-in Python module used to create classes that primarily store data with less boilerplate. * NLTK: The Natural Language Toolkit, used here for tokenizing text and identifying stop words. * PDF Plumber: A library for extracting text and metadata from PDF files. * Unittest: Python’s built-in unit testing framework used to verify our refactored functions. Code Walkthrough: Cleaning the PDF Scraper The original `PDFScrape` class was bloated with instance variables that were only used temporarily. This created unnecessary state within the object. We start by defining a clear structure for our output using a data class. ```python from dataclasses import dataclass from typing import List, Tuple @dataclass class ScrapeResult: doi: string word_score: int frequency: List[Tuple[str, int]] study_design: List[Tuple[str, int]] ``` Moving Data Out of the Class The original code hard-coded target word lists inside the class. This makes the class difficult to reuse. We move these to constants (or eventually an external config file) and convert them to sets for better performance. ```python TARGET_WORDS = {"analysis", "results", "methodology"} BYCATCH_WORDS = {"introduction", "references"} ``` Converting Methods to Pure Functions One of the biggest wins in refactoring is moving logic out of classes when it doesn't need to access `self`. Functions like `guess_doi` or `compute_filtered_tokens` should be "pure"—meaning they only depend on their input arguments. ```python def guess_doi(path_name: str) -> str: base_name = os.path.basename(path_name) # Simplified extraction logic doi_part = base_name.split('_')[0] return f"doi_prefix/{doi_part}" ``` By moving these out, we can test `guess_doi` without ever needing to instantiate a heavy scraper object. This is the cornerstone of testable architecture. Syntax Notes: The Power of Sets In the original script, the developer wrote a custom `overlap` function to find common words between two lists. This is a classic example of "reinventing the wheel." Python’s `set` type handles this natively and much more efficiently. Instead of iterating through lists, use the intersection operator: ```python The old, slow way overlap = [word for word in list_a if word in list_b] The Pythonic, fast way overlap = set_a.intersection(set_b) Or even shorter: overlap = set_a & set_b ``` Using sets changes the time complexity of lookups from O(n) to O(1) on average. When processing thousands of words across hundreds of academic papers, these micro-optimizations stack up. Practical Examples This refactoring technique applies to any data processing pipeline. Imagine you are building a log analyzer. Instead of a `LogAnalyzer` class with complex inheritance for different log formats (Nginx, Apache, Syslog), you can create a suite of pure parsing functions. A main execution script then selects the appropriate function based on the file extension. This "Functional Core, Imperative Shell" pattern keeps your logic isolated from the messy details of file I/O and configuration. Tips & Gotchas * **The Dunder New Trap**: Never override `__new__` unless you are doing something extremely specialized like creating a Singleton or a C-extension. If you find yourself using `__new__` to return instances of other classes, you actually want a **Factory Pattern**. * **Instance Variable Pollution**: Only use `self.variable` if that data needs to live for the entire life of the object. If you only need it for the duration of one method, make it a local variable. This prevents bugs where one method accidentally relies on the leftover state from another. * **Naming Clarity**: Avoid generic names like `download` if the function is actually reading a local file. Use `scrape` or `parse` to accurately reflect the action. Predictable naming reduces the need for documentation. * **Test Early**: If a function is hard to test, it is probably too tightly coupled. Breaking it down into smaller, pure functions makes writing unit tests effortless.
Dec 3, 2021Overview: Encapsulating Intent The Command design pattern is a behavioral powerhouse that transforms a request into a stand-alone object. This shift matters because it decouples the object that invokes the operation from the one that knows how to perform it. Instead of calling a method directly on a data object, you wrap that action in a "command" object. This provides immense control over the execution lifecycle, allowing you to queue operations, log them, or pass them around like any other piece of data. In a banking context, where every cent counts, this pattern provides the rigorous structure needed to manage complex financial transactions. Prerequisites To get the most out of this tutorial, you should be comfortable with Python fundamentals, specifically Object-Oriented Programming (OOP) concepts like classes and inheritance. Familiarity with Protocols (structural subtyping) is helpful, as we use them to define our command interface. You should also understand basic data structures like lists and dictionaries, which act as our "stacks" for undo and redo history. Key Libraries & Tools * **Python 3.8+**: The primary language used for implementation. * **dataclasses**: A standard library module used to reduce boilerplate code when creating data-heavy objects like bank accounts and commands. * **typing.Protocol**: Used to define the `Transaction` interface, ensuring that any command we create adheres to the required method signatures. Building the Foundation: The Command Protocol We start by defining what a transaction looks like. Instead of a concrete class, we use a `Protocol`. This allows for flexible implementation across different types of banking actions. ```python from typing import Protocol class Transaction(Protocol): def execute(self) -> None: ... def undo(self) -> None: ... def redo(self) -> None: ... ``` This interface forces every command—whether it is a deposit, withdrawal, or transfer—to know how to perform its action, reverse it, and repeat it. By defining these methods upfront, we prepare our system for non-destructive editing and history management. Implementing Concrete Commands Each banking operation becomes a concrete class. Take the `Deposit` command: it holds a reference to the `Account` and the `amount`. It doesn't just perform the math; it stores the state necessary to undo that math later. ```python @dataclass class Deposit: account: Account amount: int def execute(self) -> None: self.account.deposit(self.amount) print(f"Deposited ${self.amount/100:.2f}") def undo(self) -> None: self.account.withdraw(self.amount) print(f"Undid deposit of ${self.amount/100:.2f}") def redo(self) -> None: self.execute() ``` The logic for a `Transfer` is slightly more complex as it involves two accounts, but the pattern remains identical. The `execute` method withdraws from one and deposits into another, while `undo` simply swaps those roles. The Bank Controller: Managing the Stack To handle undo and redo, we need a central manager. The `BankController` maintains two stacks: `undo_stack` and `redo_stack`. When you execute a command through the controller, it clears the redo history and pushes the new command onto the undo stack. This is the same logic used in professional video editors and audio software. ```python class BankController: undo_stack: list[Transaction] = field(default_factory=list) redo_stack: list[Transaction] = field(default_factory=list) def execute(self, transaction: Transaction): transaction.execute() self.redo_stack.clear() self.undo_stack.append(transaction) def undo(self): if not self.undo_stack: return transaction = self.undo_stack.pop() transaction.undo() self.redo_stack.append(transaction) ``` This architecture ensures that the user can never "redo" an old action after they have performed a brand-new operation, preventing state corruption. Practical Examples: Batch Processing and Rollbacks A major advantage of the Command design pattern is the ability to group commands into a `Batch`. In banking, you might want to perform five transfers as a single unit. If the third transfer fails due to insufficient funds, you must roll back the first two to maintain data integrity. Our `Batch` command handles this by iterating through its internal list of commands and using a `try...except` block to trigger `undo()` on completed steps if an error occurs. Tips & Gotchas * **Assumption of Success**: In our basic implementation, we assume `undo()` and `redo()` will always succeed. In production systems, you must handle errors during these phases. If an undo fails, the system state is potentially compromised. * **Granularity**: Keep your commands small. Instead of a "Pay Bills" command that does everything, create a Batch of individual "Withdrawal" commands. This makes debugging and reversing specific parts of the process much easier. * **Memory Management**: If a user performs thousands of operations, your undo stack will grow indefinitely. Consider implementing a maximum stack size to prevent excessive memory consumption.
Nov 12, 2021Overview A Plugin Architecture allows you to extend an application's functionality without modifying its core source code. This pattern is essential for shipping software that remains open to extension but closed to modification. By decoupling the main logic from specific implementations, you can add features like new game characters or data processing modules simply by adding new files and updating a configuration. This tutorial demonstrates how to use Python to build a system where modules register themselves into a factory dynamically. Prerequisites To follow this guide, you should be comfortable with: * Basic Python syntax and Object-Oriented Programming (OOP). * The concept of JSON for data storage. * Familiarity with Python's `typing` module, specifically Protocols. Key Libraries & Tools * **importlib**: A built-in Python library used to import modules programmatically. * **typing.Protocol**: Used for structural typing to define an interface that plugins must adhere to. * **dataclasses**: Simplifies the creation of classes that primarily store data. Code Walkthrough 1. Defining the Interface We start by defining what a "character" looks like using a Protocol. This ensures that any plugin we load has the necessary methods, such as `make_noise`. ```python from typing import Protocol class GameCharacter(Protocol): def make_noise(self) -> None: ... ``` 2. Building the Factory The factory acts as a registry. It maintains a dictionary mapping string keys (from our JSON level definition) to creation functions. ```python from typing import Callable, Any character_creation_funcs: dict[str, Callable[..., GameCharacter]] = {} def register(character_type: str, creation_func: Callable[..., GameCharacter]): character_creation_funcs[character_type] = creation_func def create(arguments: dict[str, Any]) -> GameCharacter: args_copy = arguments.copy() char_type = args_copy.pop("type") try: creation_func = character_creation_funcs[char_type] return creation_func(**args_copy) except KeyError: raise ValueError(f"Unknown character type {char_type}") ``` 3. The Dynamic Loader This is the heart of the plugin system. It uses importlib to find and execute a plugin's initialization code. Each plugin must expose an `initialize` function. ```python import importlib def load_plugins(plugins: list[str]) -> None: for plugin_name in plugins: module = importlib.import_module(plugin_name) module.initialize() ``` 4. Creating a Plugin A plugin is just a separate Python file. For example, `plugins/bard.py` defines a new class and registers it back to the core factory. ```python from dataclasses import dataclass from game import factory @dataclass class Bard: name: str instrument: str = "flute" def make_noise(self) -> None: print(f"{self.name} plays the {self.instrument}!") def initialize(): factory.register("bard", Bard) ``` Syntax Notes We use **structural typing** via `typing.Protocol`. Unlike traditional inheritance, a class doesn't need to explicitly inherit from `GameCharacter`. As long as it implements `make_noise`, Python treats it as a valid implementation. We also utilize `**kwargs` unpacking in the factory's `create` method to pass JSON data directly into class constructors. Practical Examples * **Game Modding**: Allow players to drop a `.py` file into a folder to add custom items. * **Data Pipelines**: Add support for new file formats (CSV, Parquet, Avro) by creating reader plugins. * **CLI Tools**: Let users add custom commands to a central utility script without changing the core binary. Tips & Gotchas Always use a **try-except** block when accessing the factory dictionary to provide clear error messages for missing types. When popping the `type` key from arguments, make a copy of the dictionary first to avoid side effects that might break other parts of your application.
Sep 17, 2021Overview of Structural Pattern Matching Structural Pattern Matching represents a major evolution in how Python handles conditional logic. While it shares a superficial resemblance to the `switch-case` statements found in C or Java, it offers far more than simple value comparisons. This feature allows you to match complex data structures, extract nested values, and apply conditional logic directly within a case. It streamlines code that would otherwise require nested `if-elif-else` blocks, making it both more readable and less prone to errors when dealing with varied input formats. Prerequisites To follow this guide, you should have a solid grasp of Python fundamentals, including functions, lists, and basic object-oriented programming. Most importantly, you must use **Python 3.10** or newer, as the `match` and `case` keywords were not available in previous versions. Key Libraries & Tools - **shlex**: A standard library module used for splitting strings into tokens, specifically designed for command-line syntax parsing. - **dataclasses**: Used to create concise data-holding classes that work seamlessly with pattern matching. - **pyenv**: A tool for managing multiple Python versions, useful for testing newer features like 3.10. Code Walkthrough: Parsing Commands Let's look at how we can use matching to build a robust command-line interface. First, we'll implement a sophisticated match that handles multiple keywords and variable arguments. ```python import shlex def run_command(command_str): # Split input while respecting quotes split_cmd = shlex.split(command_str) match split_cmd: case ["quit" | "exit" | "bye", *rest] if "--force" in rest: print("Force quitting...") return False case ["quit" | "exit" | "bye", *rest]: print("Quitting normally.") return False case ["load", filename]: print(f"Loading: {filename}") case _: print(f"Unknown command: {command_str}") return True ``` In this snippet, we use the `|` (OR) operator to match multiple exit commands in a single line. The `*rest` syntax captures any additional arguments into a list. Notice the **guard condition** (`if "--force" in rest`), which allows us to filter the match based on external logic. Pattern Matching with Objects One of the most powerful aspects is matching against class attributes. By using dataclasses, we can match specific object structures. ```python from dataclasses import dataclass @dataclass class Command: action: str args: list def process_obj(cmd: Command): match cmd: case Command(action="load", args=[filename]): print(f"Object match: Loading {filename}") case Command(action="quit", args=["--force", *_]): print("Object match: Force quit detected.") ``` This syntax verifies that the object is an instance of `Command` and checks specific attribute values simultaneously. It is significantly cleaner than multiple `isinstance` and `getattr` calls. Syntax Notes - **The Underscore (_)**: This acts as a wildcard, matching anything without binding it to a variable. It is standard for the default case. - **Binding Variables**: When you use a name like `filename` in a case, Python automatically assigns the matched value to that variable for use inside the case block. Tips & Gotchas Order is critical. Python checks cases from top to bottom and stops at the first match. If you put a broad pattern (like `case [*rest]`) above a specific one (like `case ["load", name]`), the specific one will never execute. Always place your most specialized patterns at the top and your most generic catch-alls at the bottom.
Jul 9, 2021