Simplifying Architecture: Refactoring a Python Data Validator CLI

Overview

Most developers fall into the trap of over-engineering early in a project. We often reach for complex design patterns like

(MVC) or the
Command pattern
because they feel like the professional way to build. However, as this exploration of the
Data Validator CLI
demonstrates, excessive abstraction can drown your logic in boilerplate. This guide focuses on identifying "pattern fatigue" and refactoring a class-heavy Python application into a streamlined, functional, and testable tool.

Simplifying Architecture: Refactoring a Python Data Validator CLI
Refactoring a Python Data Validation Interactive Shell

We are looking at an interactive shell designed to load

files, filter data, and perform validations. While the original architecture used separate classes for every possible user command, we will strip away that complexity. By favoring functions over classes and
Protocols
over
Abstract Base Classes
(ABCs), we create a codebase that is easier to maintain and far less brittle.

Prerequisites

To follow this tutorial, you should have a solid grasp of

(3.10+) fundamentals, including dictionaries, decorators, and basic typing. Familiarity with
Pandas
for data manipulation and
Pytest
for unit testing is highly recommended. You should also understand the concept of a
CLI
(Command Line Interface) and how interactive shells differ from standard script execution.

Key Libraries & Tools

  • Python
    : The core programming language used for the entire application.
  • Pandas
    : Used for high-performance data manipulation and loading CSV files into memory.
  • Pydantic
    : Originally used for argument validation (later refactored for simplicity).
  • Pytest
    : Our primary testing framework for ensuring refactored logic remains sound.
  • Python
    : Utilized for adding type hints, Protocol, and Callable definitions to improve code clarity.

Code Walkthrough: From Classes to Functions

The original code used a classic

where every command (e.g., exit, import, merge) was a separate class with an execute method. This created a massive amount of file-system noise. Here is how we simplify it.

1. Decoupling the Event System

The project uses an event system to handle updates. Instead of nesting this inside a controller, we move it to a standalone module and simplify the logic. We add support for a "star" (*) listener, allowing one function to catch all events—perfect for a shell that just needs to print messages to the user.

# events.py
from typing import Any, Callable

_event_listeners: dict[str, set[Callable]] = {}

def register_event(event_name: str, listener: Callable[..., None]) -> None:
    if event_name not in _event_listeners:
        _event_listeners[event_name] = set()
    _event_listeners[event_name].add(listener)

def raise_event(event_name: str, *args: Any, **kwargs: Any) -> None:
    listeners = _event_listeners.get("*", set()).union(_event_listeners.get(event_name, set()))
    for listener in listeners:
        listener(*args, **kwargs)

2. Refactoring Commands to Functions

There is no need for a ShowFilesCommand class when a simple function will do. By using a dictionary to map strings to functions, we eliminate the need for a complex Factory pattern. We also replace

models with direct validation calls to reduce the number of small, single-use classes.

# commands/show_files.py
from .model import Model
from ..events import raise_event

def show_files(model: Model) -> None:
    table_names = list(model.data_frames.keys())
    message = f"Files present: {', '.join(table_names)}"
    raise_event("display_message", message)

3. Implementing the Command Factory

With commands now being functions, the factory becomes a simple registry. This is much easier to read and extend than a series of class registrations.

# commands/factory.py
from typing import Any, Callable
from .exit import exit_app
from .show_files import show_files

CommandFunc = Callable[..., None]
COMMANDS: dict[str, CommandFunc] = {
    "exit": exit_app,
    "files": show_files,
}

def execute_command(name: str, *args: Any) -> None:
    if name in COMMANDS:
        COMMANDS[name](*args)

Syntax Notes: Protocols vs. ABCs

One major change in this refactor is the move from

to
Protocols
. ABCs require explicit inheritance (nominal subtyping), which can make your code rigid. If you want to replace the
Data Validator CLI
with a different implementation, you must inherit from the ABC.

, on the other hand, use structural subtyping (often called static duck typing). As long as an object has the required methods, it matches the protocol. This is cleaner and more
Python
.

from typing import Protocol

class Model(Protocol):
    def get_data(self, alias: str) -> Any: ...
    def delete_data(self, alias: str) -> None: ...

Practical Examples

This refactored architecture is ideal for any

tool that manages state in memory. For instance, a local database explorer or a file conversion utility benefits from this "flat" structure. By keeping the main entry point as a "patching" area where you register events and initialize the shell, you keep the logic of individual commands isolated and easy to test.

In a real-world scenario, you might extend this by:

  1. Adding a Logger: Instead of just printing, have the event system send data to a logging service.
  2. Configuration Files: Use
    TOML
    or
    JSON
    to define a list of files that should automatically load when the shell starts.
  3. Advanced Querying: Integrate
    DuckDB
    to allow SQL-like queries directly on the loaded
    Pandas
    DataFrames.

Tips & Gotchas

  • Avoid Global Namespace Pollution: Always wrap your startup code in a if __name__ == "__main__": block and a main() function. This prevents variables from leaking into the global scope and makes your code easier to import for testing.
  • Relative vs. Absolute Imports: When working within a package, use relative imports (from . import module). This allows you to rename folders or move the package without breaking every internal reference.
  • The YAGNI Principle: "You Ain't Gonna Need It." Don't build an MVC structure just because you might add a GUI later. Build the simplest version that works today. If you need a GUI tomorrow, the clean, functional code you wrote will be easy to adapt.
  • Testing Output: Use the capsys fixture in
    Pytest
    to capture stdout. This is the most reliable way to test that your shell is actually displaying the correct messages to the user.
Simplifying Architecture: Refactoring a Python Data Validator CLI

Fancy watching it?

Watch the full video and context

6 min read