Cucumber – Research, Videos, Insights & Reviews

// AI Engineer

Overview Software development with AI agents often feels like the Five Monkeys experiment: agents repeat patterns without understanding the underlying purpose. When human context vanishes and LLM context windows compact, architectural integrity collapses. Michal Cichra of Safe Intelligence argues that we must replace ephemeral prompting with durable, executable guardrails. By leveraging Behavior-Driven Development (BDD) and Architecture Decision Records (ADRs), teams can force AI agents to respect the "why" behind a codebase, not just the "what." Prerequisites To implement these patterns, you should understand Git workflows (hooks and branches), basic CI/CD concepts, and the fundamentals of Behavior-Driven Development. Familiarity with Linting and Object-Relational Mapping (ORM) will help you grasp the architectural enforcement examples. Key Libraries & Tools * Cucumber: An automation tool that supports BDD by executing human-readable specifications. * Git Hooks: Scripts that run automatically before commits or pushes to enforce quality standards. * Spec 27: A specialized tool designed for testing AI agents and ensuring consistent behavior. * Linters: Static analysis tools used here to restrict module imports and prevent structural violations. Code Walkthrough Enforcing architecture requires moving rules from documentation into the development loop. Instead of asking an agent not to create N+1 queries, you structurally forbid the behavior using import linting. ```python .import-linter.config [import-linter] root_package = my_app [[import-linter.contracts]] name = "Templates cannot access the database" type = "forbidden" source_modules = ["my_app.templates"] destination_modules = ["my_app.db", "my_app.models"] ``` This configuration ensures that any attempt by an AI agent to query an ORM object directly within a rendering template will trigger a build failure. The agent receives a rejection, looks up the relevant ADR, and learns that database access must happen in the service layer to prevent performance bottlenecks. To bridge the gap between product specs and code, Cucumber scenarios serve as the source of truth: ```gherkin Feature: User Login Scenario: Successful login with valid credentials Given the user is on the login page When they enter "dev_harper" and "secure_password" And click the login button Then they should see the dashboard ``` These scenarios are executable, meaning the AI cannot hallucinate a pass; it must actually satisfy the human-readable requirements. Syntax Notes * **Gherkin Syntax**: Uses `Given/When/Then` patterns to create a shared language between product owners, engineers, and AI agents. * **Forbidden Contracts**: A linting convention where specific modules are blacklisted from importing others, preventing architectural leakage. Practical Examples Real-world applications include preventing N+1 queries by blocking database imports in templates or ensuring UI consistency by mandating that agents only use a Design System. For example, a linter can enforce that only one `PrimaryButton` component exists per view, preventing an agent from cluttering a page with competing calls to action. Tips & Gotchas * **Avoid Discussion**: Use automated tools for style and tabs. If it can be linted, it shouldn't be a conversation. * **The Feedback Loop**: Agents are most effective when they receive immediate feedback via Git Hooks. If an agent is rejected at the commit stage, it must include the fix before the code ever reaches a human reviewer. * **Context Management**: While long sessions may compact context, keep your core ADRs and BDD specs small and referenceable so the agent can reload them when needed.

Jun 3, 2026

Michal Cichra revives Cucumber to block AI from writing broken code