The God object trap Most developers start with a sensible class. It begins as a simple container for a data path and a few settings. However, Python classes frequently suffer from "feature creep." You add a data loading method, then a cleaning step, then model training logic. Suddenly, you have created a **God object**: a single class that knows too much and does too much. This anti-pattern makes testing difficult and modification dangerous, as every local change ripples through a massive, interconnected mess. Moving validation close to the data The first step in refactoring involves separating configuration from execution. By utilizing a Data Class with the `frozen=True` parameter, you create an immutable contract for your settings. Instead of keeping validation logic inside a bloated run method, you should place it within a `__post_init__` method. ```python @dataclass(frozen=True) class TrainingConfig: data_path: Path output_dir: Path test_size: float def __post_init__(self): if not self.data_path.exists(): raise FileNotFoundError(f"{self.data_path} not found") if not 0 < self.test_size < 1: raise ValueError("test_size must be between 0 and 1") ``` This shift ensures that once a `TrainingConfig` object exists, it is guaranteed to be valid. You no longer need to sprinkle `if` statements throughout your processing logic to check if paths exist or parameters are within range. Decoupling workflows from containers A critical rule of thumb emerges: if behavior relates to the data itself—validating it or deriving small values—keep it in the class. But if the behavior involves external libraries like Pandas or complex orchestration, it belongs in a standalone function. We break down the monolithic `run` method into granular, pure functions. Loading data becomes distinct from cleaning data. Feature engineering is stripped of its `self` dependency. This transformation turns a rigid class hierarchy into a flexible pipeline where functions only receive the specific data they need to operate. The final orchestration then happens in a clean, high-level `run_experiment` function that simply coordinates these smaller, testable units. By separating the "what" (data) from the "how" (workflow), you build software that survives long-term maintenance.
Data Class
Programming Concepts
- 5 days ago