Cursor's Instant Grep Makes Regex Search 1,000x Faster

Cursor shipped a blog post yesterday detailing how they built a local text search index that drops regex search latency from over 15 seconds to single-digit milliseconds. The feature, called Instant Grep, is rolling out with Cursor 2.1 and targets a problem that most developers haven't thought about but that AI coding agents hit constantly: searching large codebases with regular expressions is slow when you have to scan every file.

The post was written by Vicent Martí, a systems engineer at Cursor who previously spent roughly a decade at GitHub as a principal systems engineer, then moved to PlanetScale before landing at Cursor. His GitHub profile still reads like a resume for exactly this kind of work. And it shows, because the blog post isn't just an announcement. It is a 5,000-word interactive explainer walking through the full history of text search indexing, from Zobel, Moffat, and Sacks-Davis's 1993 paper on n-gram inverted indexes through Google Code Search, suffix arrays, GitHub's Project Blackbird, and finally Cursor's own implementation.

The actual problem

Most AI coding tools, Cursor included, give their agents access to ripgrep for searching codebases. Ripgrep is fast. Andrew Gallant has spent years making it fast. But ripgrep has a fundamental constraint: it reads every file. For a small project that's fine. For the kind of enterprise monorepos Cursor's paying customers work in, a single rg invocation can take 15 seconds or more. When an AI agent is running multiple searches per task, sometimes in parallel, those seconds compound into minutes of stalling.

The insight, which Martí is careful to point out is not new, is that you don't need to search every file. You need to figure out which files could match, and only search those.

Trigrams, and why they aren't enough

The classic approach goes back to Russ Cox's 2012 blog post about Google Code Search (itself based on the 1993 Zobel paper). You build an inverted index keyed on trigrams, overlapping 3-character chunks of text. When a regex query comes in, you decompose it into trigrams, look up which files contain all of them, and only run the actual regex engine on that subset.

Why three characters? Bigrams give you too few keys with massive posting lists. Quadgrams give you too many keys. Three is the sweet spot, or at least it was in 2012.

At GitHub's scale, plain trigrams broke down. Common trigrams like for or the matched so many documents that the posting lists became unwieldy. GitHub's internal Project Blackbird team tried augmenting trigrams with bloom filter masks containing information about the fourth character and about positional offsets. Martí describes this as getting you to roughly "3.5-grams" of selectivity. Two bytes of bloom filter per posting entry. It worked, but bloom filters saturate, and once they saturate they match everything, which puts you right back where you started.

Sparse n-grams

The technique Cursor actually uses is called sparse n-grams, and it was originally developed at GitHub for the Code Search feature that shipped in 2023. An open-source implementation exists. ClickHouse uses a version of it for their regex operator.

Instead of extracting every consecutive 3-character sequence, you assign a deterministic weight to every pair of characters, then extract substrings where the weights at both endpoints are strictly greater than all the weights inside. The resulting n-grams have variable length. Some are two characters. Some are five. The key property: because the weights are deterministic, you can reconstruct at query time which n-grams would need to exist in the index, and you only need to look up the minimum covering set.

Here's where it gets clever. If you use CRC32 as your weight function, the results are decent. But if you use an inverse frequency table built from a large corpus of open-source code, so that rare character pairs get higher weights, the covering set at query time shrinks further. Rare character pairs anchor longer, more selective n-grams. Common pairs get absorbed into the interior of longer spans.

I'd like to see how this performs on codebases with lots of generated code or minified JavaScript, where the character distribution is nothing like typical source code. The blog post doesn't address that.

Why local?

Cursor chose to run the index on the user's machine rather than on a server. The reasoning is pragmatic: the index only narrows down candidate files, and you still need to run the real regex against those files' contents. Doing that server-side would mean syncing all the files or doing expensive round-trips. Locally, it is trivial.

There's also a freshness argument. Semantic search embeddings can tolerate slightly stale data because a modified file's embedding won't move far in the vector space. But if an agent writes code and then searches for the text it just wrote and doesn't find it, Martí says it "goes into a wild goose chase." That's a nice way of saying it wastes tokens and the user's time.

The index state is keyed off a Git commit, with a mutable layer on top for uncommitted changes. Two files on disk: one with the posting lists, one with a sorted hash-to-offset lookup table. Only the lookup table gets memory-mapped into the editor process. Queries hit the lookup table with a binary search, get an offset, then read the relevant posting list directly from disk.

The numbers, such as they are

Cursor claims 16.8 seconds with ripgrep versus 13 milliseconds with Instant Grep for a local search, or 243 milliseconds if you add a network round-trip. That's roughly a 1,300x speedup for the local case. But we don't know what codebase this was measured on, what regex pattern was used, or how representative it is. The blog post includes interactive timelines showing agent workflows on the Chromium codebase and Cursor's own repo, with Instant Grep visibly compressing the search segments. Those look compelling, but they're self-selected examples.

What I'd want to see: a distribution of speedups across different query patterns and repo sizes. The technique should shine on large repos with specific queries and degrade gracefully on small repos where ripgrep is already fast. Whether the index construction cost is amortized well enough for repos that change constantly (say, a monorepo with hundreds of commits a day) is an open question the post doesn't quite answer.

What this means for the agent speed race

Grep latency probably wasn't on most people's list of AI coding bottlenecks. Token generation speed, context window limits, model reasoning quality: those get all the attention. But for agents working on large codebases, search latency is one of the few operations whose cost scales with codebase size rather than with prompt length. Cursor's claim is that removing search latency entirely creates a "qualitative difference" for agent workflows, particularly during bug investigation where the agent might run dozens of searches.

The feature is available in Cursor 2.1 and works with all models. It also powers the sidebar's manual search, including regex and word-boundary matching.

Whether competing editors adopt something similar is worth watching. VS Code already bundles ripgrep internally. The sparse n-gram approach is published, open-source implementations exist, and the concept of local search indexing isn't proprietary. But building it well, integrating it with a Git-aware freshness layer, and making it invisible to users requires the kind of systems engineering effort that most AI coding startups aren't staffed for. Martí's background at GitHub, where a lot of this indexing research originated, is not a coincidence.