A tokenizer comparison making rounds online claims Anthropic's Claude charges Hindi users 3.24 times more tokens than English speakers for the same text, versus a 1.37x penalty on OpenAI. Researcher Aran Komatsuzaki ran Richard Sutton's Bitter Lesson essay through both companies' tokenizers and posted the breakdown, normalizing everything to English.
The numbers, with caveats
The full results, according to Komatsuzaki's tokenizer test: Hindi at 1.37x (OpenAI) versus 3.24x (Anthropic). Arabic at 1.31x versus 2.86x. Chinese at 1.15x versus 1.71x. None of those numbers have been independently audited, and Anthropic doesn't publicly release its current tokenizer, which makes verification harder than it should be. The directional claim, that Anthropic's vocabulary handles non-Latin scripts worse than OpenAI's, lines up with what other benchmarks have shown for years.
Independent measurements using tiktoken benchmarks have found Arabic running at 3.30x English and Japanese at 2.93x on OpenAI's older cl100k_base encoding, with the newer o200k_base used by GPT-4o and o-series models narrowing that gap considerably. Anthropic, meanwhile, has never published a comparable benchmark for Claude, and developers have spent months reverse-engineering its tokenizer just to estimate token counts before sending requests.
Why English wins
Both companies use byte-pair encoding, which builds vocabulary by merging the most frequent character sequences in the training corpus. Common Crawl, the foundation of most LLM training data, is roughly 46% English. The most common English words collapse to one token. Indic scripts, Arabic, and Mandarin characters often don't merge as efficiently because the tokenizer hasn't seen enough of them to find compact patterns.
What you get is what some researchers now call a "language tax." A user typing in Hindi pays for more tokens to convey the same meaning, hits the context window limit faster, and burns through rate quotas at a noticeably higher pace. Same essay, same information, triple the cost. It's structural, and it compounds across every single API call.
Anthropic just shipped a fix
Komatsuzaki's numbers presumably reflect an older Claude tokenizer. On April 16, Anthropic released Opus 4.7 with a redesigned tokenizer specifically targeting this problem. Anthropic itself has acknowledged the new vocabulary pushes input token counts up by anywhere from 0 to 35 percent, varying by content type, which sounds purely bad until you read the third-party reviews: the same change reportedly cuts non-Latin script costs by 20 to 35 percent for Mandarin, Japanese, Korean, Arabic, and Hindi.
So the same update that nudges English token counts up 12 to 18 percent on typical workloads pulls Hindi much closer to parity. Whether 4.7's tokenizer actually closes Komatsuzaki's 3.24x gap to something workable hasn't been independently measured yet. Running the Bitter Lesson essay through both 4.6 and 4.7 would settle it. Someone will probably do that within a week.
What's next
Opus 4.7 is available now across Claude products and the API at the same $5 per million input tokens and $25 per million output tokens as 4.6. The flat sticker price means users writing in Hindi, Arabic, and Mandarin should see their effective bills drop, while English-heavy workloads pay slightly more. Anthropic hasn't announced whether Sonnet 4.6 and Haiku 4.5 will get the same tokenizer update. For now, the cheapest way to use Claude in any non-English language remains translating the prompt to English first, running inference, and translating the answer back.




