Curl – Research, Videos, Insights & Reviews

// AI Engineer

The shift from syntax to situation Software engineering is hitting a turning point where the actual lines of code matter less than the context provided to the machines writing them. Patrick Debois, a pioneer in the DevOps movement, suggests that we are entering an era of "vibe coding." In this new world, developers spend more time prompting and steering AI agents than manually typing syntax. If context drives the output, then context essentially becomes the new source code. This transition requires a fundamental change in how we view our instructions. We are no longer just writing one-off prompts; we are building "skills"—reusable workflows that allow an agent to understand a project's ecosystem, package managers, and specific architectural needs. When we treat context as a first-class citizen, we need a disciplined way to manage it, moving away from ad hoc hacks toward a formal Context Development Lifecycle (CDLC). Architecting the Context Development Lifecycle To manage this new asset, Debois introduces a four-phase infinity loop: Generate, Evaluate, Distribute, and Observe. The lifecycle begins with **Generation**, which goes far beyond simple prompting. It includes managing reusable instructions like `agent.md` files and pulling in live documentation from libraries to prevent model hallucination. By syncing documentation and repository data, developers ensure the agent has the exact specifications needed for the current version of a framework. Following generation is the **Evaluation** phase. Changing a few lines in a prompt can have massive, unpredictable downstream effects. Debois argues that we must apply traditional engineering rigor to these instructions. This involves linting for syntax, using "Grammarly-like" tools to check if the context is verbose enough for an LLM to understand, and running automated Evals to judge if the generated code meets specific company criteria, such as mandatory API prefixes or security standards. Testing in an undeterministic world Testing context is inherently different from testing traditional code because LLMs are undeterministic. A test that passes once might fail a minute later. To handle this, Debois suggests moving away from binary pass/fail results toward "error budgets." You might run a test five times and accept a 100% success rate for critical security rules, while allowing more flexibility for stylistic preferences. Advanced evaluation involves using an LLM-as-a-judge paired with sandboxed execution. Instead of just looking at the code the agent produced, you actually run it. By binding a judge to a tool like Curl in a secure sandbox, you can verify that an agent's output actually functions as intended in a real-world environment. This creates a feedback loop where the context is optimized based on concrete execution data. Distribution and the dependency hell of prompts Once context is generated and tested, it must be shared. While checking markdown files into a repo is a start, enterprise-scale development requires more. We are seeing the rise of context registries and marketplaces where "skills" can be packaged and versioned like libraries. However, Debois warns that 99.9% of publicly available skills are currently "crap," lacking the quality standards necessary for production. As teams begin to install these context packages, a new form of "dependency hell" emerges. A frontend context package might conflict with a global architectural package. Managing these versions—and ensuring they are scanned for security threats like prompt injections—is becoming a critical task. Tools like Snyk are already beginning to scan context for credential leaks or third-party vulnerabilities, mirroring the security practices of the SBOM. Closing the loop with production observability The final stage, **Observation**, turns the entire process into a flywheel. By analyzing agent logs, teams can identify recurring failures where the agent says, "I'm missing this piece of information." These gaps signal exactly where new context needs to be created. This feedback isn't limited to the development phase; it extends into production. If code generated by an agent fails in the wild, that failure should automatically trigger the creation of a new test case and a refinement of the original context. This organizational loop ensures that a fix made by one developer improves the agent's performance for the entire company.

May 3, 2026

Debois says context is the new code for AI agents