Reasoning Models Hit 50% of AI Traffic | OpenRouter Study

The AI industry crossed a significant threshold sometime in 2025, and most observers missed the moment it happened. According to a sweeping new analysis from OpenRouter and Andreessen Horowitz, reasoning-optimized models now process more than half of all AI inference traffic, a transformation that occurred in roughly twelve months. The study, which examined over 100 trillion tokens of real-world interactions, offers the most comprehensive empirical picture yet of how developers and organizations actually deploy large language models.

The findings paint a portrait of an industry that has moved decisively beyond the chatbot era. What began with OpenAI's release of o1 in December 2024 has cascaded into a wholesale reorganization of how AI systems operate. Models are no longer simply generating text in response to prompts. They are planning, reasoning through multi-step problems, calling external tools, and refining their outputs iteratively.

The Reasoning Revolution in Numbers

The share of total tokens routed through reasoning-optimized models climbed sharply throughout 2025. What was effectively a negligible slice of usage in early Q1 now exceeds fifty percent. This shift reflects changes on both sides of the market. Higher-capability systems from providers like xAI, Google, and Anthropic expanded what users could expect from stepwise reasoning, while demand increasingly favored models capable of managing task state and supporting agent-style workflows.

The transition carries profound implications for how AI gets built and deployed. The technology is moving from a static chat interface to an active participant in work. The competitive frontier is no longer only about accuracy or benchmarks. It is about orchestration, control, and a model's ability to operate as a reliable agent.

This paradigm, which the researchers term "agentic inference," represents a fundamental departure from the single-turn interactions that defined the first wave of LLM adoption. Users are increasingly building workflows where models act in extended sequences rather than responding to isolated prompts. A typical interaction now involves planning, retrieving context from tools or APIs, revising outputs, and iterating until a task reaches completion.

Programming Emerges as the Dominant Use Case

Perhaps the most striking finding concerns the concentration of usage around software development. Programming has become the most consistently expanding category across all models, paralleling the rise of LLM-assisted development environments and tool integrations. Programming queries accounted for roughly 11% of total token volume in early 2025 and exceeded 50% by late November.

The competitive dynamics within this category reveal the stakes involved. Anthropic's Claude series has consistently dominated the category, accounting for more than 60% of programming-related spend for most of the observed period. Yet the landscape is shifting. OpenAI expanded its share from roughly 2% to about 8% between July and November, while open source providers including Qwen and Mistral are steadily gaining ground. MiniMax has emerged as a particularly fast-rising entrant in recent weeks.

The average prompt for programming tasks now exceeds 20,000 tokens, dwarfing other categories and reflecting the complexity of modern code-assistance workflows. Developers are feeding entire codebases into models, expecting them to understand context, identify bugs, and generate solutions that integrate seamlessly with existing systems.

Open Source Models Capture a Third of the Market

The study documents a significant redistribution of market share toward open-weight models, which now account for approximately 30% of total token usage. While proprietary models from major North American providers still serve the majority of tokens, open source alternatives have grown steadily throughout the year.

Chinese-developed models have driven much of this expansion. Starting from a negligible base in late 2024, Chinese open source models steadily gained traction, reaching nearly 30% of total usage among all models in some weeks. DeepSeek and Qwen have emerged as the primary vehicles for this growth, iterating at a pace that has consistently pressured Western incumbents.

The open source ecosystem itself has fragmented in revealing ways. A year ago, DeepSeek dominated the space with two models accounting for over half of all open source token usage. By late 2025, the competitive balance had shifted from near-monopoly to a pluralistic mix. No single model exceeds 25% of open source tokens, and market share is now distributed more evenly across five to seven models. Capable new open models can capture significant usage within weeks, indicating low switching friction and a user base eager to experiment.

The Geography of AI Shifts Eastward

The report documents a pronounced geographic rebalancing of AI consumption. North America, while still the single largest region, now accounts for less than half of total spend for most of the observed period. Asia's share of global inference expenditure has more than doubled since early 2025, rising from approximately 13% to 31% in recent weeks.

This shift reflects both the maturation of Asian enterprise adoption and the competitive success of regionally developed models. Singapore alone accounts for over 9% of global token volume, followed by Germany at 7.5% and China at 6%. The emergence of Chinese providers as both model developers and exporters has created a genuinely multipolar landscape where LLMs function as a truly global computational resource.

English still dominates at nearly 83% of all tokens, but Simplified Chinese accounts for almost 5% of global traffic, with Russian, Spanish, and Thai rounding out the top five languages.

What Comes Next: The Proxy Inference Era

The researchers conclude with a forward-looking observation that may preview the industry's next major transition. As models become more specialized and usage patterns more complex, the architecture of AI deployment is evolving toward what might be called proxy inference, where sophisticated routing layers automatically distribute tasks across multiple specialized models.

OpenRouter itself embodies this approach, connecting users to over 300 models from more than 60 providers through a unified interface. The platform's growth from handling roughly 10 trillion tokens annually to over 100 trillion by mid-2025 suggests that model-agnostic infrastructure may become as important as the models themselves.

The practical implication is clear: enabling tool use is becoming essential for high-value workflows. Models without reliable tool formats risk falling behind in enterprise adoption and orchestration environments. The data also reveals a "Glass Slipper" phenomenon in user retention, where early adopters who find the right model-workload fit become deeply locked in, while later cohorts churn rapidly between alternatives.

For developers and organizations navigating this landscape, the message is unmistakable. The era of single-model dependence is ending. What replaces it will be more complex, more distributed, and far more capable than anything that came before.