Boris Cherney – Research, Videos, Insights & Reviews

// AI Engineer

The deceptive death of retrieval augmented generation Social media pundits spent early 2025 declaring the end of Retrieval Augmented Generation (RAG). They argued that long-context windows and agentic file search would render traditional vector databases obsolete. However, search volume data tells a different story. Kuba Rogut from Turbo puffer notes that search interest for RAG hit a massive inflection point in mid-2025, reaching all-time highs. The reality isn't that RAG is dying; it’s evolving from a single-call vector lookup into a sophisticated, iterative process known as agentic search. Embeddings act as a form of cached compute A critical distinction exists between the "per-session discovery" of tools like Claude Code and the indexed approach of Cursor. When an agent greps through a file system without an index, it burns tokens and time repeating the same discovery steps every single session. Kuba Rogut frames embeddings as "cached compute." By paying an upfront cost to parse and embed a codebase once, developers allow agents to skip the expensive "grep, read, assess" loop, retrieving the right context in milliseconds rather than minutes. Quantifying the semantic search advantage Cursor has proven that this indexed approach yields massive dividends. Their internal benchmarks revealed that adding semantic search to their Composer model drove a 24% increase in answer accuracy. Even in real-world AB testing, they observed a 2.6% increase in code retention within large codebases. While these numbers might seem modest at first glance, they reflect the impact of semantic search on only a fraction of total queries, proving that when context is hard to find, vector-based retrieval remains the superior tool. Staged retrieval is the trillion-token solution As models move toward handling massive context windows, the need for efficient filtering actually grows. Kuba Rogut cites Jeff Dean of Google, who argues that even with a trillion-token window, models need staged retrieval. You don't need a trillion tokens at once; you need the right million. Modern agentic search solves this by giving agents a toolkit of BM25 full-text search, regex, and vector filtering to iteratively narrow down the noise into actionable intelligence.

3 days ago

Cursor finds semantic search boosts AI coding accuracy by 24 percent