Model Training Platforms

Google Caps Per-Prompt Quota for Gemini After Subscriber Backlash

Failed requests stop counting, Flash-Lite goes free, and one prompt can no longer drain a five-hour window.

Liza Chan
Liza ChanAI & Emerging Tech Correspondent
May 31, 20263 min read
Share:
Smartphone displaying the Gemini app interface with a usage meter graphic

Google is reworking how its Gemini app burns through usage quota, less than two weeks after switching to a compute-based system at I/O 2026. VP Josh Woodward laid out the changes in an X thread on May 28, framing them as a response to paying customers who said they were hitting limits within a handful of prompts.

What broke

The new system, introduced right after I/O, weighs request complexity, tool usage, and chat length instead of just counting prompts. Reasonable on paper. In practice, subscribers started posting receipts. One user claimed a single avatar video attempt ate his entire five-hour allowance in about four minutes, and the video reportedly failed to even generate. Woodward's public reply at the time was "Yikes, let us take a look!", which is not the sentence you want from the person running a product you pay for.

That specific case traced back to a bug in Omni, Google's world model. One or two videos could wipe a quota. It's now patched, and Ultra subscribers get double the Omni generations as compensation. Whether "double" means much depends on how low the starting number was, which Google hasn't said.

The fixes that actually matter

Two changes do real work here. The first: failed requests stop counting. Woodward noted that roughly 1 in 10 requests fail on system errors, and until now you paid quota for those anyway. "Your quota is used only for successful completions," he wrote. Charging people for your own server hiccups was always going to be hard to defend, so this reads less like generosity and more like fixing something that should never have shipped.

The second: Gemini 3.1 Pro now caps how much quota a single prompt can consume. Big files and multi-step requests were the worst offenders. The prompt still runs in full, but it can't torch a whole session's budget on its own.

Flash-Lite prompts are now free and don't touch your quota at all. Pick a model and that choice sticks across sessions now, rather than silently resetting, with the system only switching you down to a lighter model when you hit a limit.

What Google still hasn't said

Deep Research and similar heavy tasks are getting more detailed usage breakdowns and notifications, eventually. The current dashboard at gemini.google.com/usage only shows a high-level view. Woodward gave no timeline for the items still in development, which is most of the transparency-related ones.

For context on the stakes: AI Pro runs $19.99 a month, the two Ultra tiers sit at $99.99 and $199.99. These are the people who complained loudest, and they're the ones Google can least afford to annoy. Pay-as-you-go top-up credits are also coming at some unspecified point, which suggests Google expects people to keep running out.

The bug fixes and the failed-request change are live now. The reporting and notification improvements have no announced date.

Tags:Google GeminiGemini appusage limitsJosh WoodwardAI subscriptionsGemini 3.1 ProGoogle AIOmni model
Liza Chan

Liza Chan

AI & Emerging Tech Correspondent

Liza covers the rapidly evolving world of artificial intelligence, from breakthroughs in research labs to real-world applications reshaping industries. With a background in computer science and journalism, she translates complex technical developments into accessible insights for curious readers.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Google Caps Gemini Per-Prompt Quota After Backlash | aiHola