Grok 4.3 fails coding stress tests while charging four times more than rivals

AI Coding Daily////2 min read

The high cost of synthetic speed

xAI recently released Grok 4.3, and the developer community immediately looked for performance gains. This iteration follows a lineage of so-called fast models, such as Grok Code Fast 1, which initially impressed the market with low latency. However, speed is a dangerous metric when detached from reliability. In a series of standardized benchmarks involving Laravel and Filament admin panels, Grok 4.3 demonstrated an alarming disconnect between its rapid execution and the actual quality of its output.

Fundamental errors in Laravel and PHP

When tasked with building a Laravel API, the model stumbled on basic architectural requirements. It failed to apply required route name prefixes and, more critically, lost crucial request type hints during code refactoring. For example, moving a route into a group resulted in the loss of the $request parameter type hint, an error that breaks functionality immediately upon execution. These are not nuanced architectural disagreements; they are fundamental syntax and logic failures that an experienced developer would expect a modern LLM to handle with ease.

Broken interfaces and inconsistent enums

The struggles continued with Filament. The prompt required the implementation of specific PHP enums using HasLabel and HasColor interfaces. Grok 4.3 failed three consecutive attempts, often ignoring the interface requirements entirely or hallucinating string values that deviated from the prompt. While one attempt was almost successful, it was marred by unnecessary "creativity" that broke automated tests. This inconsistency makes it impossible to trust the model for automated workflows.

Grok 4.3 fails coding stress tests while charging four times more than rivals
I Tested Grok 4.3 vs Other LLMs for Coding: Clear Answer

The verdict on price and performance

The most staggering data point is the cost. Accessed via OpenRouter, the model billed at roughly $0.50 per prompt. This makes it nearly four times as expensive as Kimi, a model that consistently delivered bug-free code in the same tests. While Grok 4.3 is fast—averaging two minutes per task—it is an expensive luxury that currently yields broken results. For serious development, Claude 3.5 Sonnet and GPT-4o remain the standard-bearers for accuracy and value.

Topic DensityMention share of the most discussed topics · 17 mentions across 12 distinct topics
Grok 4.3
24%· products
Filament
12%· products
Laravel
12%· products
Claude 3.5 Sonnet
6%· products
GPT-4o
6%· products
Other topics
41%
End of Article
Source video
Grok 4.3 fails coding stress tests while charging four times more than rivals

I Tested Grok 4.3 vs Other LLMs for Coding: Clear Answer

Watch

AI Coding Daily // 7:25

This channel is not for vibe-coders. It's for professional devs who want to use AI as powerful assistant, while still keeping the control of their codebase. My name is Povilas Korop, and I'm passionate about coding with AI. So I started this THIRD YouTube channel, in addition to my other ones Laravel Daily and Filament Daily. You will see a lot of my experiments with AI: I will try new things and share my discoveries along the way.

What they talk about
AI and Agentic Coding News
Who and what they mention most
Laravel
38.2%26
Anthropic
14.7%10
LiveWire
13.2%9
Filament
11.8%8
2 min read0%
2 min read