Mastering Python Refactoring: Decoupling Logic and Configuration for Scale
Overview
Writing code that works is only half the battle. In software engineering, the real challenge lies in making that code maintainable, testable, and flexible. When dealing with complex tasks like
This tutorial focuses on high-level
Prerequisites

To get the most out of this guide, you should be comfortable with:
- Intermediate Pythonsyntax (classes, functions, and decorators).
- The concept of OOPand composition.
- Type hinting and why it matters for modern development.
- Basic understanding of Python Data Classes.
Key Libraries & Tools
- Python Protocols: Part of the
typingmodule, used for structural subtyping (duck typing). - Pandas: Used for data manipulation, specifically handling data frames in the scraper.
- tqdm: A library for displaying smart progress bars during long-running loops.
- JSON: The standard format for our external configuration files.
- Hydra(Mentioned): A framework for elegantly configuring complex applications.
Code Walkthrough: From Classes to Functions
One of the biggest issues in the original code was a ScrapeRequest class that was responsible for creating its own subclasses. This creates a circular dependency and makes the code difficult to extend. We solve this by using
1. Defining the Scraper Protocol
Instead of a rigid class hierarchy, we define what a "scraper" looks like using a scrape method matching this signature is now a valid scraper.
from typing import Protocol
from dataclasses import dataclass
@dataclass
class ScrapeResult:
keywords: list[str]
word_frequencies: dict[str, int]
class Scraper(Protocol):
def scrape(self, search_text: str) -> ScrapeResult:
...
2. Refactoring Requests into Functions
We don't need a class for every type of request. By converting them into functions, we simplify the flow. These functions now accept a Scraper instance as a dependency.
def fetch_terms_from_doi(target: str, scraper: Scraper) -> ScrapeResult:
# Logic to process target and call the scraper
result = scraper.scrape(target)
return result
3. Centralizing Logging
Duplicate logging logic is a maintenance nightmare. We create a dedicated log.py to handle both file logging and console printing in one place.
import logging
def log_message(message: str):
logging.info(message)
print(message)
The Power of External Configuration
Hardcoding paths, URLs, and word lists directly into your logic makes your script brittle. If you want to share your tool with a non-programmer, they shouldn't have to touch
@dataclass
class ScrapeConfig:
export_dir: str
paper_folder: str
target_words_file: str
def read_config(config_file: str) -> ScrapeConfig:
with open(config_file, "r") as f:
data = json.load(f)
return ScrapeConfig(**data)
By passing this ScrapeConfig object down the call stack, we ensure that every component has access to the settings it needs without relying on global variables.
Syntax Notes
- Protocol: This is a powerful feature of Python's typing system. Unlike traditional inheritance, a class doesn't need to explicitly inherit from
Scraperto be considered aScraper. It just needs the right method. - Unpacking Operators (
**data): We use the double asterisk to unpack a dictionary directly into the initializer of aPython Data Classes. This only works if the keys in theJSONexactly match the field names in the class. - Context Managers: Always use
with open(...)for file operations and directory changes to ensure resources are cleaned up even if an error occurs.
Practical Examples
This refactoring approach is essential for:
- Data Science Pipelines: Where file paths and filtering parameters change with every experiment.
- CI/CD Environments: Where different configurations are needed for testing, staging, and production.
- User-Facing Tools: Allowing users to modify a simple
config.jsoninstead of editing source code.
Tips & Gotchas
- Avoid Instance Variable Bloat: Don't store temporary data as
self.variablein a class if it's only used within a single method. Use local variables to keep the object state clean. - Type Checking Gaps: Libraries like tqdmandPandasdon't always have perfect type hints. You might encounter "Unknown" types; use
typing.Anyor# type: ignoresparingly when these external tools fail the linter. - Configuration Trickle: High-level objects should receive the whole
ScrapeConfig, but low-level helpers should only receive the specific strings or sets they need. This keeps the low-level code reusable in other projects that don't use your specific config structure.

Fancy watching it?
Watch the full video and context