Sakana AI SSoT: Prompt Trick Fixes LLM Randomness

Abstract visualization of a coin flipping above rows of glowing text tokens on a dark surface

Sakana AI researchers published a prompting method called String Seed of Thought, or SSoT, that makes large language models generate probabilistic outputs much closer to what a proper random number generator would produce. The paper, from Kou Misaki and Takuya Akiba, requires no fine-tuning and no external tools. Just two extra lines in the system prompt.

The coin-flip problem

Ask a frontier LLM to flip a fair coin 1,000 times and the counts rarely land near 500-500. Ask for ten story ideas and you tend to get ten variations of the same one. Pass explicit probabilities in the prompt and the empirical distribution still drifts. The bias is consistent enough that the Sakana blog frames it as a general finding across frontier models, not a quirk of one vendor.

SSoT's fix sounds almost too small to matter. Tell the model to first emit a random string inside <random_string> tags, then derive the final answer by manipulating that string inside <thinking> tags. That's the whole technique.

What the model does on its own

The interesting part, documented in the arXiv paper, is that no one tells the model how to manipulate the string. It figures that out. For equal-probability choices, LLMs tend to sum the ASCII values of the random string and take a modulo. For skewed distributions like 30/70, they reach for rolling hashes, computing something like hash = (hash * 31 + ASCII) mod M and picking thresholds. The authors call these strategies Sum-Mod and Rolling Hash. They weren't prompted. The models just do it.

For DeepSeek-R1, SSoT pulls the Jensen-Shannon divergence close to a real PRNG. Other reasoning models see meaningful gains. QwQ-32B actually gets slightly worse on the unbiased 2-choice task, which the paper flags as a failure mode rather than hiding it. That honesty is more useful than another round of green-arrow benchmarks.

Does it survive a game?

The cleanest test sits outside the coin-flip setup. The team pitted LLMs against black-belt bots from the RPS Dojo Kaggle notebook at Rock-Paper-Scissors, where the mixed-strategy Nash equilibrium demands 1/3-1/3-1/3 play. An SSoT-prompted LLM scored near zero against the exploiter bots across 100-game matches. Prompts that only said play Nash equilibrium without giving a randomization mechanism got chewed up. Pattern-hunting adversaries are unforgiving.

On creative writing, the team ran SSoT on NoveltyBench, which measures whether eight generations from the same prompt actually differ. Diversity scores rose without quality collapsing, though a few curated categories like product recommendations still went to paraphrase-based baselines on the utility metric. According to the reasoning traces, the model builds ad-hoc templates (setting, character traits, conflict, moral) and uses different slices of the random string to pick each component.

Caveats

SSoT leans hard on the model being able to do arithmetic in its head. Smaller models without strong reasoning capabilities get much less out of it, a limitation the authors call out directly. The method is also explicitly not for tasks with one correct answer. Asking a model to compute 47 times 89 using SSoT just adds noise to something deterministic.

The paper is in peer review for ICLR 2026, which runs April 23-27 in Rio de Janeiro. For engineers, the practical question is simpler than the theory: whether adding a few lines to a system prompt is cheaper than whatever RNG plumbing you're already running.

Tags:Sakana AILLM researchprompt engineeringICLR 2026DeepSeek-R1AI benchmarksreasoning modelsmachine learningrandomness

Liza Chan

AI & Emerging Tech Correspondent

Liza covers the rapidly evolving world of artificial intelligence, from breakthroughs in research labs to real-world applications reshaping industries. With a background in computer science and journalism, she translates complex technical developments into accessible insights for curious readers.

Sakana AI Shows LLMs Can Flip Fair Coins With a Prompt Tweak

The coin-flip problem

What the model does on its own

Does it survive a game?

Caveats

Liza Chan

Related Articles

Google DeepMind Scientist Says AI Can Simulate Consciousness, Not Have It

Karpathy Says Frontier AI Models Are Bloated Because Training Data Is Garbage

Federal Science Cuts Are Feeding PhDs Into AI Gig Work, The Nation Reports

Stay Ahead of the AI Curve