Arjan Egkelmans swaps messy Python scripts for SOLID functional architecture

ArjanCodes////4 min read

Refactoring messy sales reports with SOLID design

Software development often begins with a script that simply works. In this exploration, Arjan Egkelmans (ArjanCodes) demonstrates a sales reporting tool that processes CSV data to calculate customer counts and total revenue. The initial "messy" version houses all logic within a single generate method. While functional, this monolithic approach creates a maintenance nightmare where reading files, filtering dates, calculating math, and writing JSON outputs are all tightly coupled. This lack of separation makes the code nearly impossible to unit test or extend without breaking existing logic.

Implementing protocols for rigid class structures

To bring order to the chaos, Arjan applies the SOLID principles, originally popularized by Robert C. Martin. The refactor starts with the Interface Segregation and Dependency Inversion principles. By defining a Metric using a Python Protocol, we create a blueprint for what a metric should do without dictating how it does it. This allows for specialized classes like CustomerCountMetric or TotalSalesMetric that are injected into the report generator.

Prerequisites

To follow this tutorial, you should have a solid grasp of Python 3.10+, specifically Type Hinting and class structures. Familiarity with the pandas library is essential for data frame manipulation, and a basic understanding of object-oriented programming (OOP) will help you navigate the transition from scripts to classes.

Key Libraries and Tools

  • pandas: Used for robust data ingestion and analytical filtering.
  • typing.Protocol: Essential for defining structural subtyping (duck typing) in Python.
  • json: For exporting final report data into standard web formats.

Code Walkthrough

The class-based approach relies on injecting dependencies into the constructor. This ensures the generator doesn't care if it's reading from a CSV or a database.

from typing import Protocol, Any
import pandas as pd

class Metric(Protocol):
    def compute(self, df: pd.DataFrame) -> dict[str, Any]:
        ...

class CustomerCountMetric:
    def compute(self, df: pd.DataFrame) -> dict[str, Any]:
        return {"unique_customers": df["name"].nunique()}

class SalesReportGenerator:
    def __init__(self, reader, writer, metrics: list[Metric]):
        self.reader = reader
        self.writer = writer
        self.metrics = metrics

    def generate(self, input_path: str, output_path: str):
        df = self.reader.read(input_path)
        report_data = {}
        for m in self.metrics:
            report_data.update(m.compute(df))
        self.writer.write(output_path, report_data)

This structure satisfies the Open-Closed Principle. To add a new metric, you simply write a new class and pass it into the list. You never have to touch the generate method again.

Shifting toward a functional Pythonic approach

While the class-based version is clean, Arjan argues that heavy OOP can feel un-Pythonic. A functional alternative utilizes Callable types and Data Classes to achieve the same modularity with less overhead. In this version, metrics are simple functions rather than objects with methods. This reduces boilerplate while maintaining the ability to swap components. The SOLID principles still guide the design—specifically Single Responsibility—ensuring that each function performs one discrete task, such as filtering or reading data.

Syntax Notes and Practical Tips

When using Python Protocol, remember that you don't need to explicitly inherit from the protocol class. Python uses structural subtyping to verify that your class matches the expected interface at runtime (or via mypy).

Tips & Gotchas:

  • Avoid Over-Engineering: Don't extract every single line into a class if a simple function will suffice.
  • The Main Entry Point: Keep your object instantiation in a single place (like a main function). This makes it easy to see how your application is wired together.
  • Testing: Because the reader and writer are injected, you can pass "mock" objects during testing to avoid hitting the actual disk, making your tests significantly faster and more reliable.
Topic DensityMention share of the most discussed topics · 16 mentions across 12 distinct topics
CSV
13%· products
pandas
13%· products
Python Protocol
13%· products
SOLID
13%· products
Arjan Egkelmans
6%· people
Other topics
44%
End of Article
Source video
Arjan Egkelmans swaps messy Python scripts for SOLID functional architecture

I Tried SOLID Principles in Python… Here’s What Happened

Watch

ArjanCodes // 21:56

On this channel, I post videos about programming and software design to help you take your coding skills to the next level. I'm an entrepreneur and a university lecturer in computer science, with more than 20 years of experience in software development and design. If you're a software developer and you want to improve your development skills, and learn more about programming in general, make sure to subscribe for helpful videos. I post a video here every Friday. If you have any suggestion for a topic you'd like me to cover, just leave a comment on any of my videos and I'll take it under consideration. Thanks for watching!

What they talk about
AI and Agentic Coding News
Who and what they mention most
Python
27.3%3
Python
18.2%2
Python
18.2%2
4 min read0%
4 min read