The performance gap narrows for AI coding assistants When Cursor released Composer 2, the consensus among the development community was largely lukewarm. It felt like an iterative step rather than a breakthrough. However, the recent launch of Composer 2.5 demands a reassessment. Based on rigorous head-to-head testing against established heavyweights, this model isn't just a minor patch; it’s a high-velocity contender that challenges the dominance of Claude 3.5 Sonnet and GPT-4. Speed benchmarks leave competitors behind In a live comparison against Claude Code and Kimi, the most immediate differentiator is raw execution speed. While other models exhibit a noticeable "thinking" lag of several seconds, Composer 2.5 initiates file reading and code generation almost instantaneously. It processes complex directory structures and multi-file edits in seconds, often completing entire tasks before competitors have finished their initial planning phase. For developers working in high-pressure environments, this reduction in latency translates directly into maintained flow state. Solving the N+1 query problem through deep analysis Quality metrics show a significant leap in reasoning capabilities, particularly regarding obscure documentation. In a benchmark designed around a niche package with poor documentation, Composer 2.5 successfully identified and mitigated an N+1 query issue that caused Composer 2 to fail repeatedly. By digging deeper into the vendor source code, the model achieved a clean sheet of zero errors across five automated test runs, placing it on par with top-tier models like Claude 3 Opus. Verdict: A localized powerhouse on steroids Composer 2.5 represents a "steroid-boosted" version of its underlying architecture, likely benefiting from Cursor’s recent partnership with xAI for increased compute power. While it showed a minor regression in specific frameworks like Filament, its overall utility and aggressive pricing make it the current efficiency king. For those who found previous versions "average," the 2.5 update is the version that finally earns its place in a professional workflow.
Claude 3.5 Sonnet
Products
AI Coding Daily (3 mentions) presents mixed feedback, noting that Claude 3.5 Sonnet completes tasks faster but sometimes delivers skeletal results, as seen in "I Tested New GLM-5 vs Opus and Sonnet. Wow."
- May 20, 2026
- May 12, 2026
- Apr 21, 2026
- Mar 24, 2026
- Mar 15, 2026
The Quest for Automatic Refactoring Maintaining clean code remains one of the most taxing aspects of software development. Anthropic recently introduced a dedicated `simplify` command for Claude Code, aiming to bridge the gap between functional logic and elegant architecture. This feature doesn't just tweak syntax; it evaluates code quality, reuse, and efficiency through a multi-agent workflow. While standard LLM outputs often prioritize immediate functionality, this command attempts to mimic the secondary pass a human developer takes to polish a draft. Multi-Agent Architecture in Action The technical implementation of `simplify` involves three specialized review agents—Reuse, Quality, and Efficiency—running in parallel. These agents utilize Claude 3.5 Sonnet to perform the heavy lifting of code analysis before reporting back to a main Claude 3 Opus agent for final synthesis. In a Laravel project utilizing Livewire, this resulted in six specific architectural improvements, ranging from extracting shared form traits to converting repetitive HTML into reusable Blade components. Performance and Economic Realities Efficiency comes at a cost, both in time and tokens. The simplification process for a relatively small set of files took over eight minutes to complete. More significantly, a single session consumed roughly 5% of the total token limit on a high-tier $100 monthly plan. This raises questions about the practicality of running such deep-thinking agents frequently. While the suggestions—like replacing raw strings with model constants—are objectively better for maintainability, the overhead suggests this is a tool for final polish rather than continuous development. Strategic Refactoring vs. Procedural Hack A common critique, shared by developers like Corey, suggests that if the model is capable of writing better code, it should do so on the first attempt. However, the iterative nature of this tool mirrors the human development cycle. We rarely write the most optimized version of a feature while simultaneously solving the core business logic. By separating the "build" phase from the "simplify" phase, Claude Code ensures that the refactoring logic doesn't interfere with the initial generation of working code.
Mar 1, 2026The Autonomous Agent Tsunami Hits the Beach Jerry%20Murdock, the visionary co-founder of Insight%20Partners, views the current artificial intelligence wave not as a steady rising tide, but as a massive tsunami. For years, the water has been receding, pulling back to sea while the industry watched from the shore with a mix of curiosity and complacency. That period of observation is over. Murdock argues that the real danger of a tsunami isn't when it's out at sea; it's when it hits the beach. We are currently in the messy, violent transition where the "pre-peak" waves are beginning to dismantle established software structures. While the general public focuses on chatbots, Murdock identifies Autonomous%20Agents as the specific force that will redefine the next decade of enterprise value. These are not merely digital assistants; they are probabilistic entities capable of writing code, making purchasing decisions, and executing complex workflows without human intervention. This shift represents a transition from software as a tool used by humans to software as an employee that operates on behalf of the organization. Companies that fail to move to higher ground by becoming AI-native risk being swept away by a "Sassacre"—a systematic devaluation of traditional Software-as-a-Service (SaaS) models that rely on seat-based pricing and human-centric interfaces. Why Cursor and Legacy SaaS Face Instant Obsolescence The velocity of this disruption is perhaps best illustrated by the sudden vulnerability of yesterday's darlings. Murdock points to Cursor, a company currently valued in the tens of billions, as an example of a product that many AI-native founders already consider obsolete. While Cursor is a sophisticated tool for developers, the next generation of startups, such as E2B and Lotus%20AI, are utilizing autonomous agents to write the code itself, effectively bypassing the need for human-augmented coding environments. This isn't just about coding; it's a fundamental challenge to the "System of Record." Historically, companies like Salesforce derived their value from being the immutable source of truth for customer data. However, if autonomous agents begin to bypass these platforms or if new agents create their own decentralized systems of record, the massive market caps of legacy players could evaporate. Murdock compares Salesforce to Mount Everest—it won't melt overnight—but its value is directly tied to the health of the ecosystem built on top of it. As those smaller, integrated companies are disrupted by agents, the mountain itself begins to lose its stature. The bolt-on AI strategy, where legacy firms simply add a chatbot layer to their existing stack, is a defensive maneuver that Murdock suggests will rarely result in "gold medal" performance. The Migration from Nvidia to Custom Silicon One of the most provocative claims Murdock makes involves the eventual decline of Nvidia's dominance in the compute market. While Jensen%20Huang currently sits atop the world's most valuable hardware empire, the rise of open-source models like Llama%203 and DeepSeek is paving the way for ASIC%20chips (Application-Specific Integrated Circuits). As autonomous agents become more specialized, they will require chips tuned for specific workloads rather than general-purpose GPUs. Murdock suggests that the orchestration layer of the future will triage workflows: expensive, high-reasoning tasks might go to Claude%203.5%20Sonnet, while routine operations will run on cheap, local ASICs. This shift is already visible in the strategies of major tech players; Meta has notably pushed back against complete reliance on Nvidia, betting instead on custom silicon to gain an edge in efficiency. Even Nvidia’s acquisition of Grock (not to be confused with Elon%20Musk's Grok) signals their awareness that memory-on-chip capabilities and ASIC support are the next battlegrounds for CUDA viability. Parallels to the Dot-Com Bust of 2000 To understand the current market volatility, Murdock looks back to March 2000. He recalls the era when tech stocks dropped 40% in a single quarter, followed by a multi-year "malaise" that was eventually finalized by the tragic events of 9/11. The core issue in 2000 was a lack of infrastructure; the world wasn't ready for commerce on dial-up. Today, the infrastructure is here, but the speed of change is creating a similar environment of "cautious sidelines" investing. Public markets are reacting with extreme sensitivity to AI updates. When Anthropic releases a security feature, established players like CrowdStrike see their stock prices swing wildly. Murdock doesn't see this as simple panic; he sees it as a rational pause by investors who realize they don't have enough information to pick winners in a world where the application stack is being eaten by the model layer. The "Sassacre" isn't just a catchy term—it's a recognition that the metrics we used to value companies (revenue growth and margins) have become transient in the face of agent-driven automation. The Labor Market and the Rise of UBI The most significant implication of autonomous agents is their impact on the white-collar labor force. Murdock predicts that the first jobs to disappear won't be the ones currently held by senior staff, but the "next in line" roles: junior developers, executive assistants, and marketing coordinators. Because agents don't require sick leave, don't feel entitled, and can work 24/7 at the speed of compute, the incentive for small and medium businesses to replace human input with agent orchestration is overwhelming. This shift will move beyond the boardroom and into the halls of government. Murdock boldly predicts that Universal%20Basic%20Income (UBI) or a "minimum viable income" will become a central ballot question in the next two and a half years. No political administration can preside over a 15% unemployment rate caused by technological displacement without offering a radical policy response. The transition will be painful, potentially leading to a migration of workers out of expensive urban hubs back to rural areas where they can utilize technology to manage land or pursue a higher quality of life supported by government grants. Surviving the Edge Reflecting on thirty years of venture capital, Murdock emphasizes that the best investors are not those who avoid failure, but those who learn from it. He recounts the early days of Insight Partners, where he and co-founder Jeff%20Horing were frequently rejected by LPs. Their survival through the 2000 crash and the subsequent building of a $90 billion platform was a product of persistence and intuition. For the next generation of founders and VCs, Murdock's advice is clear: embrace the agent. The era of the billion-dollar single-person company is no longer a fantasy; it is a mathematical probability in an environment where one human can orchestrate a fleet of autonomous employees. The goal isn't just to build a product; it's to find a problem so significant that only an agent-native solution can solve it. The tsunami is here. You can either learn to surf it or be buried by it.
Feb 28, 2026The New Model on the Block Google recently launched Gemini 3.1 Pro within its Antigravity IDE, promising a significant leap in developer productivity. To see if the hype holds water, I put the model through a rigorous gauntlet: seven Laravel projects requiring complex API CRUD generation. While the integration feels seamless on the surface, the actual developer experience reveals a model still finding its footing in a competitive market. Performance and Latency Issues Speed defines the modern coding workflow. Unfortunately, Gemini 3.1 Pro lags behind. In side-by-side testing against Claude 3.5 Sonnet, Google's offering took six minutes to complete a task that Anthropic models finished in three. The model frequently pauses to calculate small details, launching internal help tools like "PHP design help" just to scaffold basic models. This suggests a lack of deep, native training on modern PHP frameworks. The Testing Gap and Agent Intelligence One glaring omission in the initial output was the lack of automated tests. While Gemini 3.1 Pro successfully generated models, factories, and controllers, it ignored the crucial step of verification. However, the model showed a flash of brilliance when prompted about this failure. It recognized its own "skills" via Laravel Boost and proactively corrected the mistake, eventually delivering 53 passing tests. This ability to discover and activate tools mid-stream is a clear positive, even if it requires manual intervention. Reliability and Quota Hurdles The Antigravity IDE experience remains plagued by stability issues. Random crashes and "terminated due to error" messages interrupted the workflow multiple times. Worse, the free tier quota is incredibly opaque. After only nine minutes of work on a Livewire project, the system cut off access entirely. Unlike the clear usage metrics provided by OpenAI, Google leaves developers guessing about how much "intelligence" they actually have left. Final Verdict: Catching Up Gemini 3.1 Pro is currently a secondary choice for heavy-duty Laravel development. It feels like a product in a "catching up" phase rather than a market leader. While the Gemini CLI shows promise for future MCP support, the current speed and reliability gaps make it hard to recommend over the more polished offerings from Anthropic.
Feb 20, 2026Overview: The Shift to Agentic Development In the current software development landscape, we are moving beyond simple Large Language Models (LLM) wrappers toward sophisticated, autonomous entities known as AI agents. Unlike traditional chatbots that merely respond to prompts, these agents can use tools, access external data, and make decisions to execute complex business workflows. Redberry, a veteran Laravel partner, has formalized this process through LarAgent, an open-source tool designed to bring agentic capabilities directly into the PHP ecosystem. This approach matters because it allows developers to automate non-deterministic tasks—decisions that can't be hard-coded with simple if/else logic—while staying within a framework they already know and trust. Prerequisites To effectively build agentic systems with the tools discussed, you should have a solid grasp of the following: * **Modern PHP & Laravel**: Proficiency in service providers, configuration management, and the Laravel ecosystem. * **LLM Fundamentals**: Understanding of system prompts, temperature settings, and the difference between deterministic and non-deterministic outputs. * **API Integration**: Experience connecting with third-party services, as agents rely heavily on tool-calling to interact with the world. * **Vector Databases & RAG**: A basic understanding of Retrieval Augmented Generation (RAG) for providing agents with custom context. Key Libraries & Tools * **LarAgent**: An open-source package that provides the primitives for building agents in Laravel, including instruction management and tool-calling orchestration. * **Laravel AI SDK**: A first-party toolset from the Laravel team focused on standardizing AI interactions across different providers. * **MCP Client for Laravel**: A specialized package allowing Laravel applications to connect to Model Context Protocol (MCP) servers, giving agents access to an unlimited array of pre-built tools. * **Model Agnostic Layers**: Architectural patterns that allow switching between providers like OpenAI, Anthropic, or local models via configuration. The Anatomy of an AI Agent Sprint Building an agent isn't a linear coding task; it's a process of experimentation. A typical five-week proof of concept (PoC) focuses on time-boxing the non-deterministic nature of the project. Week 1: Discovery and Mapping Before writing code, you must map the business process. The goal is to identify which parts are deterministic (best handled by standard code) and which require an agent. If you can write a rule-based logic for a decision, you should. AI is reserved for the gaps where rules fail. Weeks 2-3: The First Prototype Using LarAgent, developers define the agent's instructions and the tools it can access. A "tool" in this context is often a PHP class or a specific API endpoint the agent can trigger. ```php // Defining a basic agent in LarAgent $agent = LarAgent::make('SupportBot') ->instructions('Assist users with order tracking.') ->tools([ OrderTrackingTool::class, InventoryCheckTool::class ]); ``` During this phase, you establish a benchmark data set. This is a collection of inputs and expected outcomes used to measure the agent's performance. Weeks 4-5: Iteration and Accuracy Initial success rates for agents often hover around 60-70%. The final weeks involve refining prompts, adjusting the orchestration of multiple agents, and tweaking tool definitions to push accuracy toward a production-ready 98%. This often involves "human-in-the-loop" design, ensuring a person reviews critical agent decisions. Syntax Notes & Orchestration Patterns One notable pattern in agentic development is the move away from a single, massive agent toward **multi-agent orchestration**. Instead of asking one agent to "manage an entire warehouse," you might have a "Receiver Agent," a "Stock Agent," and a "Dispatcher Agent." In LarAgent, this is handled through configuration-level model selection. Because different models excel at different tasks, you might use a smaller, faster model for simple categorization and a larger model for complex reasoning. ```php // Configuration-based model selection 'agents' => [ 'categorizer' => [ 'model' => 'gpt-4o-mini', 'temperature' => 0, ], 'analyzer' => [ 'model' => 'claude-3-5-sonnet', 'temperature' => 0.5, ], ] ``` Practical Examples * **Automated Test Case Generation**: Agents can scan project requirements and draft comprehensive test suites, which human developers then verify and approve. * **Legacy System Interfacing**: Using agents to interpret data from legacy systems that lack modern APIs, acting as a conversational or structured bridge between old and new tech. * **Regulated Industry Workflows**: In finance or healthcare, agents can pre-process documents and flag anomalies, significantly reducing manual labor while keeping a human as the final authority. Tips & Gotchas * **Avoid Tool Overload**: Exposing too many tools (more than 10) can overwhelm the LLM, leading to "hallucinations" or incorrect tool selection. Keep the agent's toolkit focused. * **Deterministic First**: Never use AI for something that can be solved with a simple database query or a standard function. It is more expensive and less reliable. * **Benchmark Early**: You cannot improve what you cannot measure. Build your test data set in week one so you have a baseline for every iteration. * **Legacy Blockers**: When integrating with ancient systems, expect blockers. Discovery should prioritize credential and API access to avoid stalling the sprint.
Feb 6, 2026Overview: The Context Gap in AI Development AI agents have changed how we write code, but they often struggle with the nuances of specific frameworks. Standard models like Claude 3.5 Sonnet or GPT-4o possess vast general knowledge but lack the hyper-specific context of your local Laravel project. This lead to hallucinations, outdated syntax, or the AI suggesting patterns that conflict with your application's architecture. Laravel Boost solves this by acting as a bridge. It injects project-specific metadata, documentation, and "skills" directly into your AI agent's reasoning loop. Instead of manually feeding documentation to a chat window, Boost automates the context delivery. Version 2.0 introduces a major shift from a monolithic guideline approach to a modular, "skills-first" architecture. This reduces context bloat, saves on token costs, and makes the AI significantly more accurate by only providing the information it needs at that exact moment. Prerequisites To follow this guide and implement Boost 2.0, you should be comfortable with the following: * **PHP 8.2+:** Boost 2.0 has officially dropped support for PHP 8.1. * **Laravel 11 or 12:** Older versions like Laravel 10 are supported only by legacy versions of Boost (v1.x). * **Composer:** Basic knowledge of managing PHP dependencies. * **AI Coding Agents:** Familiarity with tools like Cursor, Claude Code, GitHub Copilot, or Juni. Key Libraries & Tools * **Laravel Boost:** The core CLI tool and package that manages AI context and skills. * **Laravel MCP:** A package for building Model Context Protocol servers, allowing AI agents to interact with your app's internal state (routes, database schemas, etc.). * **Remotion:** A React-based framework for programmatic video creation, often used as a demonstration of complex AI skill integration. * **Prism:** A Laravel package for working with LLMs, used to demonstrate how documentation can be bundled directly into vendor folders for AI consumption. Code Walkthrough: Installing and Configuring Boost 2.0 Setting up Boost 2.0 is a methodical process. It begins with the Laravel installer and moves into a randomized, aesthetically pleasing configuration CLI. 1. Installation First, ensure your Laravel installer is up to date to access the built-in Boost prompts during new project creation. If you are adding it to an existing project, use Composer: ```bash composer require laravel/boost --dev ``` 2. Initialization Run the install command to start the interactive configuration. ```bash php artisan boost:install ``` This command triggers a CLI interface featuring randomized gradients—a touch of "developer joy" added by Pushpak Chhajed. You will be prompted to select which features to configure: AI Guidelines, Agent Skills, or the MCP server. 3. Selecting Your AI Agent Boost 2.0 simplifies agent selection. Instead of choosing both an IDE and an agent, you now choose the specific agentic tool you use daily, such as Claude Code or Cursor. Boost will then automatically determine the correct file paths for these tools. 4. Automated Skill Syncing To ensure your AI context stays updated as your project evolves, add the update command to your `composer.json` file: ```json "scripts": { "post-update-cmd": [ "@php artisan boost:update" ] } ``` This ensures that every time you update your dependencies, Boost re-scans your `composer.json` and syncs the relevant skills for packages like Inertia, Tailwind CSS, or Livewire. Deep Dive into Skills vs. Guidelines Understanding the distinction between these two features is critical for a clean development workflow. Guidelines: The Global Rules Guidelines are persistent. They contain high-level rules that the AI should *always* know. For example, if you always use Pest for testing or strictly follow an Action-based architecture, these belong in your guidelines. However, shoving every package's documentation into a guideline leads to "context fatigue," where the AI becomes overwhelmed and starts to hallucinate. Skills: The On-Demand Context Skills are modular Markdown files. They aren't loaded into the AI's memory until they are needed. Each skill has a name and a description in its front matter. When you ask the AI to "build a new UI component with Tailwind," the agent sees the keyword "Tailwind," looks at its available skills, and activates the Tailwind CSS skill. This keeps the prompt lean and the output precise. Syntax Notes: Custom Skill Creation Creating a custom skill allows you to automate highly specific tasks, like generating pull request descriptions or adhering to internal API versioning standards. Skills rely on a specific Markdown front matter format. ```markdown --- name: my-custom-skill description: Use this skill when generating API endpoints or PR descriptions. --- My Custom Skill Rules - Always use the `App\Actions` namespace for business logic. - Ensure all API responses are wrapped in a standard `JsonResource`. - Pull Request descriptions must include a 'Breaking Changes' section. ``` When you save this in a local `.boost/skills` directory and run `php artisan boost:update`, Boost replicates this file into the hidden configuration folders of your chosen AI agents (e.g., `.cursor/rules` or `.claudecode/skills`). Practical Examples Automating Pull Requests You can create a skill that teaches an agent how to use the GitHub CLI. By invoking the skill with a slash command (e.g., `/create-pr`), the AI can analyze your staged changes, write a formatted description, and execute the CLI command to open the PR. Package-Specific Intelligence If you build a project using Filament, you don't want the AI thinking about Filament when you are just debugging a console command. By using a Filament skill, the AI only accesses those specific layout and component rules when you are actively working on the admin panel. Tips & Gotchas * **Git Management:** Never commit the auto-generated agent folders (like `.cursor/rules`) to your repository. These are local mirrors. Only commit the `.boost` folder and your `boost.json` file. This allows your teammates to run `boost:install` and get the exact same AI behavior on their machines. * **Hallucination Prevention:** If your AI starts ignoring your project structure, check your guideline length. If it exceeds 500 lines, move package-specific rules into individual skills. * **Legacy Projects:** Do not attempt to use Boost 2.0 on Laravel 10 projects. The dependency tree for the new MCP features and skills requires the modern internals found in Laravel 11 and up. * **Manual Invocation:** If an agent fails to auto-detect a skill, you can usually force it by using a slash command in the chat interface. Most modern agents support `/` to list and select active skills.
Jan 30, 2026The Digital Renaissance of Open Source For years, a silent frustration plagued the technological world: the recurring disappointment of Chinese open-source models that shimmered on benchmarks but crumbled under the weight of real-world complexity. We call this phenomenon **benchmaxing**. It involves optimizing models specifically for testing datasets while ignoring the messy, organic logic required for human interaction. Kimi K2.5, the latest release from Moonshot AI, suggests we have reached a turning point where the artifact finally matches the promise. The Agent Swarm Architecture One cannot discuss Kimi K2.5 without examining its most provocative feature: the **Agent Swarm**. While traditional Large Language Models (LLMs) operate as a single, linear intelligence, this model can deploy up to 100 sub-agents in parallel. This decentralized approach mimics a workshop of specialized artisans rather than a lone scholar. This parallelization results in a 4.5x speed increase for complex tool calls, allowing the system to verify its own logic across multiple threads simultaneously. It is a structural evolution that reflects the complex, multi-layered societies of our own history. Synthesis of Vision and Code The most grueling trial for any modern model remains its ability to translate visual stimuli into functional logic. In tests involving a high-fidelity website recording, Kimi K2.5 attempted to recreate a complex front-end experience from video alone. While it missed the subtle 'smoke' cursor effects, it successfully replicated the core layout, interactive 'eye' elements, and brand essence. This capability extends beyond mere imitation; it suggests an internal understanding of how visual components map to underlying structural code. In single-shot coding tests, the model even constructed a functional 'Melvore Idol' style game—complete with inventory systems and experience tracking—from a single prompt. Analysis of the Global Hierarchy When we look at the market share by token usage, Google and Anthropic still hold the high ground. However, the emotional intelligence scores tell a different story. Kimi K2.5 recently seized the number one spot on the EQ Bench, surpassing GPT-4o and Gemini 1.5 Pro. It indicates that the model excels at creative writing and abstract nuances—areas where open-source models historically struggled. While it remains a newcomer in token market share, its performance suggests a looming disruption to the established Western dominance. Final Verdict Kimi K2.5 is a rare specimen that justifies the surrounding fervor. Its combination of swarm agentics and vision-to-code synthesis makes it a formidable tool for developers and creative thinkers alike. While the gap between high-res reality and model output still exists, the distance has closed significantly. It is no longer a matter of if open-source will catch up, but rather when the established giants will have to defend their territory.
Jan 29, 2026Overview of Structural Code Review Software development often suffers from a gap between "working code" and "complete features." Claude Code allows you to bridge this gap by implementing custom slash commands and specialized agents. Instead of generic chat interactions, you can create a dedicated **Structural Completeness Reviewer**. This setup acts as a final guardian against technical debt by auditing dead code, change completeness, and cross-layer integration. It ensures that when you add a field to a model, you haven't forgotten the database index, the UI filter, or the data seeder. Prerequisites and Tools To follow this guide, you should have Claude Code installed and a basic understanding of repository structures. Key tools include: * **Claude Code CLI**: The primary environment for executing commands. * **Claude Models**: Specifically Claude 3.5 Sonnet or Claude 3 Opus. * **Markdown**: Used for defining agent instructions and command logic. Creating Your Slash Command You can bootstrap a command by simply asking the AI. For example, prompt: "Create a slash command called `/are-we-done` that calls the agent `structural_completeness_reviewer`." You have two choices for scope: **Global** (available across all projects) or **Local** (contained within the current project's `.claude/commands` directory). Once created, open the generated `.md` file in your IDE. You can manually refine the logic by copying raw configurations from community repositories. A standard command structure typically includes the trigger name and the specific agent it should invoke. Building the Specialist Agent An agent is defined by its system prompt. Create a new folder named `agents` and a markdown file for your reviewer. The magic lies in the instructions. Rather than focusing on "code style," instruct the agent to act as a **Technical Lead**. ```markdown Role: Structural Completeness Reviewer Focus on: - Dead code detection - Dependency audit - Feature parity across layers (e.g., Model vs. UI) ``` Practical Application and Token Usage When you run `/are-we-done`, the agent analyzes uncommitted changes. In a real-world test on a quiz project, the agent correctly identified that while tags were added to questions, the corresponding database indexes and admin filters were missing. While these deep reviews consume more tokens—sometimes increasing session usage by several percentage points—the cost is negligible compared to the long-term price of accumulated technical debt.
Jan 22, 2026Overview AI coding agents are shifting from simple autocomplete helpers to sophisticated architectural assistants. This transition demands a new set of workflows that prioritize context over raw syntax. For Laravel developers, this means moving beyond basic copilot functionality and embracing tools that understand the framework's specific conventions. By utilizing Laravel Boost and high-level agents like Cursor, Claude Code, and Codex CLI, developers can automate the repetitive scaffolding of CRUD operations, validation logic, and API resources while maintaining strict control over the code quality. Prerequisites To follow this guide effectively, you should possess a baseline understanding of the following: * **PHP & Laravel**: Familiarity with Eloquent models, migrations, and API resource structures. * **Terminal Proficiency**: Ability to run composer commands and navigate CLI interfaces. * **Git Basics**: Understanding of branching and commits, as AI-generated code should always be tracked for easy rollback. * **Node/NPM**: Required for installing various CLI-based agents. Key Libraries & Tools * **Laravel Boost**: A specialized package that generates `.mdc` and `.md` guideline files to ensure AI models follow modern Laravel conventions. * **Cursor**: A fork of VS Code that integrates AI deep into the editor's UI for "tab-tab-tab" workflows. * **Claude Code**: An agent from Anthropic that operates entirely within the terminal, focusing on agentic task completion. * **Codex CLI**: OpenAI's command-line interface powered by GPT-4o (and later versions) for high-accuracy code generation. * **Laravel Idea**: A powerful plugin for PHPStorm that provides deep framework integration. Solving the Context Problem with Laravel Boost The primary failure point for AI is "stale knowledge." Models trained on Laravel 11 might hallucinate syntax when working in a Laravel 12 environment. Laravel Boost solves this by injecting your specific project context into the AI's prompts. When you run the installation command, the package scans your `composer.json` to detect exactly which versions of Livewire, Tailwind, or Pest you are using. It then generates specific guideline files for your IDE of choice. This ensures the AI doesn't suggest outdated patterns like `DB::table()` when your team prefers modern Eloquent query builders. ```bash composer require laravel-boost php artisan boost:install ``` Code Walkthrough: Generating a CRUD API When using an agent like Cursor, the most efficient path is a combination of manual scaffolding and AI refinement. Instead of asking the AI to build everything from scratch, start with the core model and migration. 1. Scaffolding the Core Run the standard Artisan command to ensure the foundation is deterministic. ```bash php artisan make:model Post -m ``` 2. Defining the Migration with AI Autocomplete Open the migration file and let the AI suggest fields. By simply hitting `Tab`, the AI recognizes common Laravel patterns like `user_id` foreign keys and `string` title fields based on the model name. 3. Agentic Resource Generation Open the Agent window (`Cmd+I`) and provide a high-context prompt. Specifying the use of Form Requests is critical to avoid bloated controllers. ```markdown Generate a CRUD API for the Post model. - Use API Resources for the response. - Place validation in separate Form Request classes. - Ensure the controller is in the API namespace. ``` 4. Refining the Resource If the generated PostResource includes sensitive data like timestamps, you can use Claude Code to refine it without leaving the terminal: ```bash Inside Claude Code CLI In @app/Http/Resources/PostResource.php, remove the created_at and updated_at fields from the return array. ``` Syntax Notes * **Slash Commands**: Agents like Claude Code use commands like `/usage` to monitor token limits or `/clear` to reset the context window. * **Markdown Guidelines**: Most agents look for a `.cursorrules` or `claude.md` file. These are standard Markdown files that dictate coding style, such as "Use Pest for testing" or "Prefer constructor injection." * **MCP (Model Context Protocol)**: Some tools use MCP to allow the AI to search documentation or run local commands directly. Practical Examples * **Test-Driven Scaffolding**: Use Codex CLI to generate both the controller and a corresponding Pest test suite. The agent will run the tests automatically and fix the code until they pass. * **Plan Mode Execution**: For complex features like a multi-step checkout, enter "Plan Mode." This allows you to verify the AI's architectural logic (e.g., service classes vs. jobs) before any files are actually modified. Tips & Gotchas * **Vibe Coding vs. Precision**: Avoid long-running chat sessions. As the conversation grows, the "context pollution" increases, leading to hallucinations and higher token costs. Use the `/new` command or open a new chat window for every distinct task. * **Pricing Horror Stories**: Cursor pricing can be volatile if you use expensive models like Claude 3.5 Sonnet for small tasks. Monitor your dashboard frequently. For minor refactors, switch to cheaper models like Grok Code or Composer-01. * **Git Integration**: Always commit your work before triggering an agent. While Cursor offers an "Undo" button, it only reverts the most recent block of changes. A Git rollback is the only reliable way to recover from an AI that has accidentally modified 20 different files.
Nov 20, 2025The New Frontier of AI-Native Development The relationship between developers and their code is undergoing a fundamental transformation. We are moving past the era of simple auto-completion and into a world where AI agents act as full-fledged pair programmers. Ashley Hindle, leading the AI initiatives at Laravel, describes this shift not as a replacement of the developer's craft, but as an expansion of their capabilities. The challenge remains that while Large Language Models (LLMs) are becoming increasingly sophisticated, they often lack the specific, up-to-date context of a framework's evolving ecosystem. They might know PHP, but they might not know the breaking changes in the latest version of Pest or the specific architectural nuances of a Filament project. This is where Laravel Boost enters the scene. It is not an LLM itself; rather, it is a sophisticated bridge. By providing a composer package that injects guidelines, tools, and version-specific documentation directly into the AI agent's context, it eliminates the "hallucination gap" that occurs when an AI relies on stale training data. The goal is simple: make the AI agent a more competent contributor by giving it the same reference materials a human developer would use. This approach moves development from "vibe coding"—relying on the AI's best guess—to a deterministic, high-quality workflow grounded in the actual state of the codebase and the framework. The Architecture of Context: Ingestion and Vector Search To understand how Boost works, we must look at the ingestion pipeline that powers its documentation search. Unlike static documentation, the information fed to an AI agent needs to be formatted for retrieval. Ashley Hindle explains that the team uses Laravel Cloud to host an API that serves as the central nervous system for documentation. The pipeline downloads markdown files from GitHub APIs and processes them through a recursive text splitter. This "chunking" is vital because an AI cannot ingest a 50-page manual in one go and expect to find a specific method signature accurately. These chunks are then vectorized using OpenAI embedding models and stored in PostgreSQL via PGVector. Interestingly, the team does not rely solely on vector search. They employ a hybrid approach that includes Postgres full-text search with GIN indexes. This dual-layer strategy ensures that both semantic meaning (found through embeddings) and specific syntax or keyword matches (found through full-text search) are captured. For a developer, this means when the AI searches for a specific Inertia.js helper, it finds the exact documentation snippet relevant to their specific version, rather than a generic or outdated example. Mastering the Model Context Protocol (MCP) A core technical pillar of Boost is the Model Context Protocol (MCP). Think of MCP as a standardized way for an AI agent to "talk" to a server and use its features. Ashley Hindle uses a physical analogy: if the AI is the brain, MCP provides the hands. It allows the agent to ask, "What are you capable of?" and receive a list of tools—such as searching documentation, scanning a `composer.lock` file, or checking Tailwind CSS configurations. The brilliance of the MCP implementation in Boost lies in its invisibility. When a developer installs Boost, it auto-detects system-installed IDEs and agents like Cursor, Claude Code, or PHPStorm and configures the MCP server automatically. The AI agent then decides when to call these tools based on the user's prompt. If you ask the AI to write a test, it sees the `search_docs` tool in its inventory, notices you have Pest installed, and retrieves the latest Pest documentation before writing a single line of code. This autonomous decision-making by the AI, guided by the tool descriptions provided by Boost, creates a seamless experience where the developer doesn't have to manually prompt the AI to "look at the docs." Guidelines vs. Tools: The Art of Nudging There is a subtle but critical distinction between providing an AI with a tool and providing it with a guideline. A tool is a functional capability, while a guideline is a set of behavioral rules. Ashley Hindle discovered during development that tools alone weren't enough. An AI might have access to documentation but still write code in an old style. By providing specific guidelines—often delivered via `claude.md` or `custom-instructions` files—Boost "nudges" the AI to follow modern conventions. These guidelines are dynamically generated based on the project's specific dependencies. If a project uses Livewire, Boost includes Livewire guidelines; if it uses React, it swaps them. This prevents context bloat, ensuring the AI isn't distracted by irrelevant rules. Furthermore, Boost is designed to respect the "existing conventions" of a codebase. Guidelines often tell the AI to look at sibling controllers or existing patterns first. This ensures that the AI doesn't just write "perfect" Laravel code, but code that actually fits the specific project it is working in. The team is currently working on an override system that allows developers to provide their own custom blade files for guidelines, ensuring that team-specific standards take precedence over defaults. The Economics of Tokens and Efficiency A common concern with AI-assisted development is the cost and token usage. Adding thousands of lines of documentation and guidelines to every request sounds expensive. However, Ashley Hindle argues that Boost often pays for itself. While the guidelines might add roughly 2,000 tokens to a request—a small fraction of the 200,000+ context windows in modern models like Claude 3.5 Sonnet—they significantly reduce the number of failed attempts. When an AI has the correct context, it gets the code right on the first try. Without Boost, a developer might go through five or six back-and-forth prompts to correct the AI's hallucinations, consuming far more tokens in the long run. Additionally, many providers now support prompt caching. Because the Boost guidelines remain consistent across a session, they are frequently cached at the API level, often resulting in a 90% discount on those tokens. The efficiency isn't just financial; it's temporal. The developer stays in the "flow state" because they aren't constantly acting as a human debugger for the AI's mistakes. Future Horizons: Benchmarks and Package Integration The roadmap for Laravel Boost is ambitious. One of the most significant upcoming projects is "Boost Benchmarks." Ashley Hindle is building a comprehensive suite of projects and evaluations to move beyond "gut feel" testing. This will allow the team to statistically prove that one version of Boost is, for example, 20% more accurate at fixing bugs in Filament than the previous version. It will also provide data on which LLMs—be it Claude, GPT-4o, or Gemini—perform best with specific Laravel tasks. Another major shift is the move toward a package-contributed guideline system. The Laravel team cannot write and maintain guidelines for every package in the ecosystem. The goal is to create an API that allows package creators—like Spatie—to include their own Boost-compatible guidelines within their repositories. When a developer runs `boost install`, the system will detect these third-party packages and automatically pull in the author-approved AI instructions. This decentralization will ensure that the entire PHP ecosystem can become AI-native, with every package providing the necessary context for agents to use it effectively. As context windows continue to expand toward the millions, the bottleneck will no longer be how much the AI can remember, but how accurately we can feed it the truth.
Aug 30, 2025The software development industry is currently navigating a chaotic transition into the AI age. We see a flood of new models from OpenAI, Anthropic, and Google, each claiming to be industry-leading. For developers, the challenge isn't just using these tools, but understanding which ones actually work. We have moved past the era of simple chat interfaces and entered a phase of "vibe coding"—a term coined by Andrej Karpathy that suggests we can build entire products by simply managing the "vibe" of the AI's output. While the hype is intoxicating, professional engineering requires moving beyond vibes and into structured, high-leverage workflows. Decoding the Benchmarks To choose the right tool, you must understand how these models are measured. We have transitioned away from the HumanEval era. While HumanEval was the gold standard in 2021, modern models score so high on its 164 Python tasks that it no longer differentiates quality. Today, we look to more rigorous tests like SWE-bench. This benchmark uses real-world bugs from production Python projects. When Claude 3.5 Sonnet hits a 73% success rate on these tasks, it isn't just completing a toy function; it is submitting functional patches for complex, multi-file issues. Another critical metric is the Aider Polyglot benchmark, which evaluates how well models handle localized edits across multiple languages like Go and Rust. This tracks efficiency and token cost, providing a practical view of which models are actually viable for daily production use. The Vibe Coding Paradox Andrej Karpathy sparked a firestorm with the concept of vibe coding—accepting all AI suggestions and letting the model drive the entire development process. This trend sits at the peak of inflated expectations on the Gartner Hype Cycle. History repeats itself here; the Agile Manifesto faced similar cynicism in 2001 when critics called it an attempt to undermine engineering discipline. The reality is that AI is a chainsaw. It is incredibly powerful but has jagged edges. If you operate it without a leash, you risk shipping vulnerabilities and "software burrows"—unstable patches held together by digital magic. The goal isn't to let the AI take the wheel entirely but to maintain human control over these high-powered agents. Shifting Mental Gears: Ask, Edit, and Agent Effective AI pair programming requires shifting between distinct modes. **Ask Mode** serves as your conversational debugger, possessing read-only access to answer architectural questions. **Edit Mode** is for precision surgery; the model sees specific files and generates diffs for localized refactors. **Agent Mode** is the most powerful, allowing the AI to search the repository, run terminal commands, and execute tests until a feature is complete. Using the wrong mode for a task leads to context window bloat and poor results. For instance, don't use Agent mode for a simple variable rename; use Edit mode to keep the model's focus narrow and surgical. Advanced Workflows for High-Performance Teams To truly integrate AI, you must codify your preferences. Use global and project-specific instruction files (like `.cursorrules`) to define your naming conventions and architectural patterns. This eliminates the need to constantly correct the AI on small stylistic choices. Furthermore, embrace **Multi-Agent Workflows**. Research shows that a "Reflection" pattern—where one model writes code and a second model reviews it—can boost accuracy by up to 20%. By supplying the reviewer's critique back to the writer, you create a self-correcting loop that catches bugs before they reach your local environment. This is the difference between "vibing" and professional engineering.
Aug 21, 2025