Arjan Egges dismantles Python's God object to save bloated codebases
The God object trap

Most developers start with a sensible class. It begins as a simple container for a data path and a few settings. However, Python classes frequently suffer from "feature creep." You add a data loading method, then a cleaning step, then model training logic. Suddenly, you have created a God object: a single class that knows too much and does too much. This anti-pattern makes testing difficult and modification dangerous, as every local change ripples through a massive, interconnected mess.
Moving validation close to the data
The first step in refactoring involves separating configuration from execution. By utilizing a Data Class with the frozen=True parameter, you create an immutable contract for your settings. Instead of keeping validation logic inside a bloated run method, you should place it within a __post_init__ method.
@dataclass(frozen=True)
class TrainingConfig:
data_path: Path
output_dir: Path
test_size: float
def __post_init__(self):
if not self.data_path.exists():
raise FileNotFoundError(f"{self.data_path} not found")
if not 0 < self.test_size < 1:
raise ValueError("test_size must be between 0 and 1")
This shift ensures that once a TrainingConfig object exists, it is guaranteed to be valid. You no longer need to sprinkle if statements throughout your processing logic to check if paths exist or parameters are within range.
Decoupling workflows from containers
A critical rule of thumb emerges: if behavior relates to the data itself—validating it or deriving small values—keep it in the class. But if the behavior involves external libraries like Pandas or complex orchestration, it belongs in a standalone function.
We break down the monolithic run method into granular, pure functions. Loading data becomes distinct from cleaning data. Feature engineering is stripped of its self dependency. This transformation turns a rigid class hierarchy into a flexible pipeline where functions only receive the specific data they need to operate. The final orchestration then happens in a clean, high-level run_experiment function that simply coordinates these smaller, testable units. By separating the "what" (data) from the "how" (workflow), you build software that survives long-term maintenance.
- Arjan Egges
- 13%· people
- Data Class
- 13%· programming concepts
- God object
- 13%· programming concepts
- Object-oriented programming
- 13%· programming concepts
- Pandas
- 13%· libraries
- Other topics
- 38%

Let’s Fix Bloated Python Classes Once and For All
WatchArjanCodes // 14:55
On this channel, I post videos about programming and software design to help you take your coding skills to the next level. I'm an entrepreneur and a university lecturer in computer science, with more than 20 years of experience in software development and design. If you're a software developer and you want to improve your development skills, and learn more about programming in general, make sure to subscribe for helpful videos. I post a video here every Friday. If you have any suggestion for a topic you'd like me to cover, just leave a comment on any of my videos and I'll take it under consideration. Thanks for watching!