Arjan Egges warns Python developers: stop treating data science code as throwaway

ArjanCodes////2 min read

The Architecture of Agile Analysis

Many developers fall into the trap of viewing data science scripts as disposable. Since objectives shift as insights emerge, the temptation is to ignore software design. This is a mistake. Arjan Egges argues that proper project structure is precisely what allows for rapid iteration. If your code is a mess, you can't pivot when the data reveals a new direction.

Arjan Egges warns Python developers: stop treating data science code as throwaway
Most Python Projects Fail Because of This Structure

Standardizing the Starting Line

Consistency across projects reduces the cognitive load of context switching. For teams, this isn't just a preference—it's a requirement for collaboration. Using Cookiecutter allows you to instantiate projects from a template like Cookiecutter Data Science, ensuring every experiment begins with the same directory structure and configuration.

Pipeline Power and Library Leverage

Writing custom code for data cleaning often introduces unnecessary bugs. Mature libraries like pandas and scikit-learn offer tested, optimized patterns that actually teach you domain standards. For complex workflows, tools like Taipy provide a backend-to-frontend pipeline that manages scenarios and versioning.

# Installing Taipy for pipeline management
pip install taipy

Decoupling Data and Configuration

Hard-coding constants is the fastest way to break a deployment. Keep your configuration in a single, separate location. Using environment variables via a .env file is the gold standard, as it integrates seamlessly with cloud environments and prevents sensitive database paths from leaking into version control.

The Notebook Exit Strategy

Jupyter notebooks are excellent for exploration but terrible for maintenance. Once a piece of logic is stable, move it into a shared Python package. This transition enables professional tooling—auto-formatters, linters, and most importantly, unit tests.

Robustness Beyond the Chart

Visualizing data isn't a substitute for testing. Subtle bugs might not skew a scatter plot but can lead to catastrophic decision-making errors. Writing unit tests ensures that when you swap a dataset or hand code to a colleague, the underlying logic remains sound. It’s about building a project that functions autonomously, rather than one that requires your constant intervention to survive a deadline.

Topic DensityMention share of the most discussed topics · 10 mentions across 10 distinct topics
Arjan Egges
10%· people
Cookiecutter
10%· products
Jupyter notebooks
10%· products
pandas
10%· products
Other topics
50%
End of Article
Source video
Arjan Egges warns Python developers: stop treating data science code as throwaway

Most Python Projects Fail Because of This Structure

Watch

ArjanCodes // 14:49

On this channel, I post videos about programming and software design to help you take your coding skills to the next level. I'm an entrepreneur and a university lecturer in computer science, with more than 20 years of experience in software development and design. If you're a software developer and you want to improve your development skills, and learn more about programming in general, make sure to subscribe for helpful videos. I post a video here every Friday. If you have any suggestion for a topic you'd like me to cover, just leave a comment on any of my videos and I'll take it under consideration. Thanks for watching!

What they talk about
AI and Agentic Coding News
Who and what they mention most
Python
27.3%3
Python
18.2%2
Python
18.2%2
2 min read0%
2 min read