Arjan Egges warns Python developers: stop treating data science code as throwaway
The Architecture of Agile Analysis
Many developers fall into the trap of viewing data science scripts as disposable. Since objectives shift as insights emerge, the temptation is to ignore software design. This is a mistake. Arjan Egges argues that proper project structure is precisely what allows for rapid iteration. If your code is a mess, you can't pivot when the data reveals a new direction.

Standardizing the Starting Line
Consistency across projects reduces the cognitive load of context switching. For teams, this isn't just a preference—it's a requirement for collaboration. Using Cookiecutter allows you to instantiate projects from a template like Cookiecutter Data Science, ensuring every experiment begins with the same directory structure and configuration.
Pipeline Power and Library Leverage
Writing custom code for data cleaning often introduces unnecessary bugs. Mature libraries like pandas and scikit-learn offer tested, optimized patterns that actually teach you domain standards. For complex workflows, tools like Taipy provide a backend-to-frontend pipeline that manages scenarios and versioning.
# Installing Taipy for pipeline management
pip install taipy
Decoupling Data and Configuration
Hard-coding constants is the fastest way to break a deployment. Keep your configuration in a single, separate location. Using environment variables via a .env file is the gold standard, as it integrates seamlessly with cloud environments and prevents sensitive database paths from leaking into version control.
The Notebook Exit Strategy
Jupyter notebooks are excellent for exploration but terrible for maintenance. Once a piece of logic is stable, move it into a shared Python package. This transition enables professional tooling—auto-formatters, linters, and most importantly, unit tests.
Robustness Beyond the Chart
Visualizing data isn't a substitute for testing. Subtle bugs might not skew a scatter plot but can lead to catastrophic decision-making errors. Writing unit tests ensures that when you swap a dataset or hand code to a colleague, the underlying logic remains sound. It’s about building a project that functions autonomously, rather than one that requires your constant intervention to survive a deadline.
- Arjan Egges
- 10%· people
- Cookiecutter
- 10%· products
- Cookiecutter Data Science
- 10%· products
- Jupyter notebooks
- 10%· products
- pandas
- 10%· products
- Other topics
- 50%

Most Python Projects Fail Because of This Structure
WatchArjanCodes // 14:49
On this channel, I post videos about programming and software design to help you take your coding skills to the next level. I'm an entrepreneur and a university lecturer in computer science, with more than 20 years of experience in software development and design. If you're a software developer and you want to improve your development skills, and learn more about programming in general, make sure to subscribe for helpful videos. I post a video here every Friday. If you have any suggestion for a topic you'd like me to cover, just leave a comment on any of my videos and I'll take it under consideration. Thanks for watching!