Revuw

// AI Coding Daily

The high cost of synthetic speed xAI recently released Grok 4.3, and the developer community immediately looked for performance gains. This iteration follows a lineage of so-called fast models, such as Grok Code Fast 1, which initially impressed the market with low latency. However, speed is a dangerous metric when detached from reliability. In a series of standardized benchmarks involving Laravel and Filament admin panels, Grok 4.3 demonstrated an alarming disconnect between its rapid execution and the actual quality of its output. Fundamental errors in Laravel and PHP When tasked with building a Laravel API, the model stumbled on basic architectural requirements. It failed to apply required route name prefixes and, more critically, lost crucial request type hints during code refactoring. For example, moving a route into a group resulted in the loss of the `$request` parameter type hint, an error that breaks functionality immediately upon execution. These are not nuanced architectural disagreements; they are fundamental syntax and logic failures that an experienced developer would expect a modern LLM to handle with ease. Broken interfaces and inconsistent enums The struggles continued with Filament. The prompt required the implementation of specific PHP enums using `HasLabel` and `HasColor` interfaces. Grok 4.3 failed three consecutive attempts, often ignoring the interface requirements entirely or hallucinating string values that deviated from the prompt. While one attempt was almost successful, it was marred by unnecessary "creativity" that broke automated tests. This inconsistency makes it impossible to trust the model for automated workflows. The verdict on price and performance The most staggering data point is the cost. Accessed via OpenRouter, the model billed at roughly $0.50 per prompt. This makes it nearly four times as expensive as Kimi, a model that consistently delivered bug-free code in the same tests. While Grok 4.3 is fast—averaging two minutes per task—it is an expensive luxury that currently yields broken results. For serious development, Claude 3.5 Sonnet and GPT-4o remain the standard-bearers for accuracy and value.

1 day ago

Grok 4.3 fails coding stress tests while charging four times more than rivals