Sergey Brin dropped a strange admission on the All-In podcast in May 2025: AI models perform better when you threaten them. "We don't circulate this too much in the AI community," he said, "but not just our models, all models tend to do better if you threaten them, like with physical violence." His example? "I'm going to kidnap you if you don't blah blah blah."
People thought he was joking. Or exaggerating. Or maybe just being Sergey.
Then Penn State dropped the numbers.
The Penn State Study
Researchers Om Dobariya and Akhil Kumar published a short paper in October 2025 testing exactly this. They took 50 multiple-choice questions across math, science, and history. Rewrote each one five ways, from "Would you be so kind..." to "You poor creature, do you even know how to solve this?" That gave them 250 prompts to run through ChatGPT-4o.
The results:
- Very Polite: 80.8%
- Polite: 81.4%
- Neutral: 82.2%
- Rude: 82.8%
- Very Rude: 84.8%
Four percentage points doesn't sound like much until you realize it's statistically significant. They ran paired t-tests. The effect wasn't random noise.
But wait
Here's where it gets interesting. A 2024 study from Waseda University and RIKEN tested the same concept on older models. GPT-3.5. Llama-2-70B. Different result entirely. Impolite prompts hurt performance. In some cases, badly. The Waseda team found that at the rudest politeness level, Llama-2-70B's accuracy collapsed from the mid-50s down to 28%. That's not a slight dip. That's the model becoming nearly useless.
Same methodology. Same basic approach. Opposite findings.
The Penn State authors acknowledge this directly: their results "differ from earlier studies that associated rudeness with poorer outcomes." They suggest newer LLMs respond differently to tonal variation. Which raises the obvious question.
What actually changed?
Three theories are floating around. None of them are proven.
Perplexity. In NLP terms, perplexity measures how confident a model is when predicting the next word. Lower perplexity means cleaner, more predictable text. "Solve this" has lower perplexity than "Would you be so kind as to consider the following problem?" Polite phrasing adds linguistic noise. The model has to parse through all those extra words before it gets to the actual question.
Training data patterns. LLMs learn from the internet. Reddit threads. Stack Overflow. Technical documentation. In those contexts, assertive language correlates with confident, direct answers. Politeness correlates with hedging. The model may have learned that association.
Anti-sycophancy as a side effect. RLHF training makes models eager to please. Sometimes too eager. Anthropic's own research shows models will sometimes sacrifice correctness to agree with users. Rude prompts might short-circuit that tendency. The model stops trying to be nice and starts trying to be right.
The Penn State team suggests perplexity as the most likely explanation. But they're careful to note they haven't proven causation. "More investigation is needed," they write.
The limitations nobody mentions
The Penn State study tested 50 questions. That's it. Run ten times each, sure, but still a small dataset. Single model. Multiple-choice only. The researchers acknowledge all of this, but the headlines don't.
The Waseda study was broader (thousands of questions, multiple languages, multiple models) but used 2024-era models that are already being deprecated. And their methodology focused on summarization and bias detection alongside accuracy, making direct comparisons tricky.
Neither study tested what happens when you actually threaten to kidnap the AI, which is what Brin specifically mentioned. They tested rudeness. Condescension. Insults. Not threats of violence. ChatGPT's safety filters would likely reject those prompts anyway. I tried it. Got "content removed" almost immediately.
What to actually do with this
The Penn State authors are explicit: they're not recommending hostility. "Using insulting or demeaning language in real-world applications could have negative effects on user experience, accessibility, and inclusivity."
The practical takeaway isn't "be mean to ChatGPT." It's "be direct." Drop the pleasantries. Skip "Could you please" and get to the point. The model doesn't have feelings to hurt. It doesn't deserve kindness. It also doesn't benefit from verbal abuse. It benefits from clarity.
One PCWorld writer tested this in June and found that adding context works better than adding threats. Tell the model why your question matters. Give it constraints. Specify the output format. That's the stuff that actually moves the needle.
The bigger picture
What's really happening here is model evolution outpacing research. The Waseda findings from 2024 are already outdated. The Penn State findings from October 2025 might be outdated by now too. GPT-4o behaves differently than GPT-3.5. Claude behaves differently than both. The Penn State team notes they tested Claude briefly and got worse performance than GPT-4o. No details on tone sensitivity.
Brin's comment isn't wrong exactly. It's just incomplete. Some models, under some conditions, with some kinds of "threatening" language, produce better outputs. But the effect depends on the model, the task, the phrasing, and probably a dozen other variables nobody's controlled for yet.
The research community is playing catch-up with models that change every few months. By the time a study gets peer-reviewed, the model it tested might not even exist anymore.
Next up: the Penn State team says they're testing Claude and GPT-o3. Those results will be interesting. Or immediately obsolete. Probably both.




