A team at Google Research has published a finding so simple it borders on embarrassing: if you send a large language model the same prompt twice in a row, it performs better. No fine-tuning, no clever prompt engineering, no new training data. Just copy, paste, done.
The research paper, authored by Yaniv Leviathan, Matan Kalman, and Yossi Matias, tested prompt repetition across seven models from four different providers (Gemini, GPT, Claude, and DeepSeek) on seven benchmarks. Out of 70 model-benchmark combinations, the technique produced statistically significant accuracy gains in 47 cases. Losses? Zero.
Why does repeating yourself help?
The explanation comes down to how these models read. LLMs process text left to right, one token at a time. Each token can only "see" what came before it, never what comes after. So in a prompt structured as context first, question second, the context tokens were processed before the model had any idea what question was coming. The early parts of your prompt are, in a real sense, half-understood.
Repeating the prompt gives every token a second pass where it can attend to the full input. The researchers frame it as a cheap workaround for causal attention's inherent blind spot, which is a polite way of saying these models have been reading with one eye closed this whole time.
The numbers, and what to make of them
Most of the gains were modest. On standard benchmarks like ARC Challenge and MMLU-Pro, accuracy ticked up a few percentage points. Where things got dramatic was on tasks that specifically punish ordering effects. On a custom benchmark called NameIndex, where the model has to find a specific name in a list of 50, Gemini 2.0 Flash Lite jumped from 21% to 97% accuracy. That's not a rounding error.
The researchers ran all tests through official APIs in early 2025, which means these results reflect production models, not lab prototypes. One wrinkle: the paper notes that Anthropic's Claude models showed some latency increase on very long inputs, even though the technique generally adds no meaningful delay. The extra processing happens during the "prefill" stage, which runs in parallel on GPU hardware, so doubling the input doesn't double the wait time.
"The number of generated tokens does not increase," the paper states, and the output format stays identical. That last part matters for anyone thinking about deployment, because it means prompt repetition is a drop-in change with no downstream plumbing required.
So what's the catch?
If you're already using chain-of-thought reasoning or asking the model to think step by step, prompt repetition mostly does nothing. The paper tested this combination and found 5 wins, 1 loss, and 22 neutral results out of 28. The researchers suggest this makes sense: reasoning models already repeat parts of the user's request during their internal deliberation. They're essentially doing prompt repetition on their own, just less efficiently.
The team behind this paper is the same trio that developed speculative decoding, now a standard technique across the industry for speeding up LLM inference. Leviathan is a Distinguished researcher and Head at Google Research; Matias is VP of Engineering and Research and Head of Google Research overall. These aren't fringe researchers chasing novelty.
Whether prompt repetition becomes standard practice or stays a curiosity depends on something the paper doesn't address: cost. Doubling input length means doubling input token costs on pay-per-token APIs, even if latency stays flat. For a task where you're already sending 4,000 tokens, that's 4,000 extra tokens billed at input rates every single call. The accuracy gains might not justify the spend for most applications, particularly the ones where the improvement is only a few points.
The paper is available on arXiv now. No code to release, because there's nothing to code. The entire method fits in a single sentence: send it twice.




