Refactoring messy sales reports with SOLID design Software development often begins with a script that simply works. In this exploration, Arjan Egkelmans (ArjanCodes) demonstrates a sales reporting tool that processes CSV data to calculate customer counts and total revenue. The initial "messy" version houses all logic within a single `generate` method. While functional, this monolithic approach creates a maintenance nightmare where reading files, filtering dates, calculating math, and writing JSON outputs are all tightly coupled. This lack of separation makes the code nearly impossible to unit test or extend without breaking existing logic. Implementing protocols for rigid class structures To bring order to the chaos, Arjan applies the SOLID principles, originally popularized by Robert C. Martin. The refactor starts with the **Interface Segregation** and **Dependency Inversion** principles. By defining a `Metric` using a Python Protocol, we create a blueprint for what a metric should do without dictating how it does it. This allows for specialized classes like `CustomerCountMetric` or `TotalSalesMetric` that are injected into the report generator. Prerequisites To follow this tutorial, you should have a solid grasp of Python 3.10+, specifically type hinting and class structures. Familiarity with the pandas library is essential for data frame manipulation, and a basic understanding of object-oriented programming (OOP) will help you navigate the transition from scripts to classes. Key Libraries and Tools * **pandas**: Used for robust data ingestion and analytical filtering. * **typing.Protocol**: Essential for defining structural subtyping (duck typing) in Python. * **json**: For exporting final report data into standard web formats. Code Walkthrough The class-based approach relies on injecting dependencies into the constructor. This ensures the generator doesn't care if it's reading from a CSV or a database. ```python from typing import Protocol, Any import pandas as pd class Metric(Protocol): def compute(self, df: pd.DataFrame) -> dict[str, Any]: ... class CustomerCountMetric: def compute(self, df: pd.DataFrame) -> dict[str, Any]: return {"unique_customers": df["name"].nunique()} class SalesReportGenerator: def __init__(self, reader, writer, metrics: list[Metric]): self.reader = reader self.writer = writer self.metrics = metrics def generate(self, input_path: str, output_path: str): df = self.reader.read(input_path) report_data = {} for m in self.metrics: report_data.update(m.compute(df)) self.writer.write(output_path, report_data) ``` This structure satisfies the **Open-Closed Principle**. To add a new metric, you simply write a new class and pass it into the list. You never have to touch the `generate` method again. Shifting toward a functional Pythonic approach While the class-based version is clean, Arjan argues that heavy OOP can feel un-Pythonic. A functional alternative utilizes `Callable` types and Data Classes to achieve the same modularity with less overhead. In this version, metrics are simple functions rather than objects with methods. This reduces boilerplate while maintaining the ability to swap components. The SOLID principles still guide the design—specifically **Single Responsibility**—ensuring that each function performs one discrete task, such as filtering or reading data. Syntax Notes and Practical Tips When using Python Protocols, remember that you don't need to explicitly inherit from the protocol class. Python uses structural subtyping to verify that your class matches the expected interface at runtime (or via mypy). **Tips & Gotchas:** * **Avoid Over-Engineering**: Don't extract every single line into a class if a simple function will suffice. * **The Main Entry Point**: Keep your object instantiation in a single place (like a `main` function). This makes it easy to see how your application is wired together. * **Testing**: Because the reader and writer are injected, you can pass "mock" objects during testing to avoid hitting the actual disk, making your tests significantly faster and more reliable.
Robert C. Martin
People
- Sep 26, 2025
- Feb 14, 2023
- Jul 29, 2022
- Apr 23, 2021