guardrails – Research, Videos, Insights & Reviews

// Cal Newport

The Mythos incident and the collision of AI and state power Late last week, the intersection of Artificial Intelligence ethics and raw political power created a tectonic shift in the industry. The Trump administration took the unprecedented step of placing Anthropic's newest models, Claude Mythos and its consumer-facing sibling Fable 5, on an export control list. This move effectively shuttered access to the technology, even for the company’s own foreign national employees. While critics decry the act as capricious and chaotic, a deeper analysis reveals a complex interplay between corporate marketing strategies, national security theater, and a desperate need for a formal regulatory regime. Anthropic finds itself in a peculiar trap of its own making. Back in April, the company launched a public relations campaign that painted Claude Mythos as a dual-use hazard—a system so proficient at exploiting computer code that it posed a severe threat to national security. By framing their innovation as a digital demon that required careful stewardship, they sought to build a brand centered on safety. However, when they attempted to release Fable 5 with what they claimed were sufficient guardrails, the government called their bluff. If a company tells the state they have built a cyberweapon, they should not be surprised when the state treats them like a weapons manufacturer. The fallacy of the unbreachable guardrail The central technical dispute involves the efficacy of AI guardrails. According to former White House AI czar David Sacks, an independent researcher demonstrated that the safety layers on Fable 5 were easily evaded through jailbreaking. This failure highlights a fundamental truth in machine learning: guardrails are often superficial. They typically rely on fine-tuning with reinforcement learning to divert model outputs toward safe responses. Why jailbreaks are inevitable Jailbreaking works by bypassing the specific neural patterns activated during safety training. If a user obfuscates their intent or uses an exceptionally long context window, they can navigate around the 'downhill' logic that leads to a refusal. To date, we have never seen a guardrail that could not be jailbroken. The Trump administration’s insistence that Anthropic 'fix' these inherent architectural weaknesses before release suggests a fundamental misunderstanding of how large language models function, yet it raises a valid ethical question: should we release systems whose only safety mechanism is a lock that any determined actor can pick? Marketing fear as a corporate moat We must question whether Claude Mythos actually represents a revolutionary leap in danger or merely an incremental step in capability. Evidence suggests the latter. Independent researchers have shown that smaller, cheaper models can identify the same vulnerabilities Anthropic touted as unique to Mythos. The 'scare campaign' appears to be a calculated marketing strategy. By positioning their models as uniquely dangerous, these companies aim to justify higher token prices and secure a seat at the regulatory table, effectively building a moat that smaller competitors cannot cross. This strategy has a devastating societal cost. These companies have run a psychological operation on the public for two years, fostering a climate of anxiety and distrust to bolster their own importance. The psychic damage of this constant alarmism likely outweighs any marginal productivity gains AI has provided to date. When the government intervenes to 'call the bluff' of a company claiming to have summoned a demon, it is acting as a necessary, if blunt, instrument of public health. Towards a transparent licensing regime The current haphazard approach is unsustainable. We cannot have a system where the Commerce Department acts based on the personal whims of an administration or the influence of Silicon Valley donors. However, the solution is not total deregulation. We need a mandatory, transparent licensing regime where the burden of proof for safety lies with the developer. Reframing AI as a consumer product If a virology lab conducts gain-of-function research and warns of a pandemic, the government restricts that research. AI should be no different. A formal framework would force companies to move away from 'F1 car' models—massive, unpredictable frontier systems designed for headlines—and toward narrow, responsible tools. We need a future where AI is treated like a normal consumer product, beholden to the same safety standards as an automobile or a pharmaceutical. Only then can we move past the era of 'marketing by apocalypse' and toward technology that serves human interests without holding our collective psyche hostage.

Jun 17, 2026

Trump blocks Anthropic model as fear marketing backfires on Silicon Valley