The price of running an AI system that matches GPT-3.5's performance has plummeted from $20 per million tokens in November 2022 to $0.07 by October 2024. According to Stanford's 2025 AI Index Report, the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between November 2022 and October 2024.
This isn't a gentle decline. Epoch AI researchers, tracking prices across six benchmarks and multiple performance thresholds, found prices declining between 9x per year and 900x per year, with a median of 50x per year. The fastest drops started after January 2024, when the median rate jumped from 50x to 200x annually.
The speed thing
A 50x annual price reduction for equivalent performance is extraordinary even by tech industry standards. For context, Moore's Law predicted roughly 2x improvements every two years. The cost of LLM inference has dropped by a factor of 1,000 in 3 years. That's not a typo.
What's driving this? A combination of factors working simultaneously: smaller models achieving the same benchmark scores as their larger predecessors, hardware becoming more cost-effective at roughly 30% annually, and intense price competition among providers. While AI costs have declined, they remain significant for smaller firms lacking the deep pockets of tech giants. But the trajectory is unmistakable.
OpenAI's own pricing tells the story. The cost of accessing GPT-4 class performance has seen an 83% reduction for output tokens and a staggering 90% drop for input tokens in just 16 months.
DeepSeek broke something
Then there's DeepSeek. The Chinese startup's R1 model, released in January 2025, matches its performance at 95% less cost compared to OpenAI's o1 reasoning model. While OpenAI charges $60 per million tokens for its flagship reasoning model, DeepSeek offers comparable performance at roughly $0.55 to $2.19 per million tokens.
DeepSeek's inference offerings have sent shockwaves through the AI industry by undercutting incumbent providers' prices by an order of magnitude. What costs tens of thousands monthly on proprietary APIs can run for hundreds on DeepSeek.
The technical approach matters here. DeepSeek skipped the expensive supervised fine-tuning that OpenAI relies on, instead dropping models straight into reinforcement learning. The result: chain-of-thought reasoning learned through trial and error, at a fraction of the compute cost. OpenAI spent $700,000 daily in 2023 on infrastructure alone, with 2024 projections nearing $7 billion annually.
Whether DeepSeek's approach proves durable remains an open question. But it's already forced responses: OpenAI introduced GPT-4o Turbo at reduced cost, and Google pushed its Flash series. The price war is now structural.
Enterprises are actually using this stuff
The timing aligns with a surge in enterprise adoption. ChatGPT now serves more than 800 million users every week, and this rapid consumer adoption has created a powerful flywheel, accelerating the pace at which AI is being brought into work and professional settings.
According to McKinsey's latest survey, 78% of survey respondents reported AI use by their organizations, up from 55% in 2023. The number reporting generative AI use in at least one business function more than doubled, from 33% to 71%.
But the real action is in departmental spending. Departmental AI spending hit $7.3 billion in 2025, up 4.1x year over year. Coding is the clear standout at $4.0 billion, making it the largest category across the entire application layer.
JPMorgan Chase offers a case study in scale. The bank spends roughly $2 billion annually on AI development, and CEO Jamie Dimon said JPMorgan is saving roughly $2 billion every year from using AI in everything from risk management to customer service. That's break-even on a massive investment, and Dimon called it "the tip of the iceberg."
The bank's LLM Suite platform now reaches 250,000 employees, with half using it daily. In a CNBC demo, LLM Suite produced a credible five-page investment-banking deck for a meeting with Nvidia's leaders in about 30 seconds, work that would previously have taken junior bankers hours.
The catch
None of this comes without trade-offs. The ISG State of Enterprise AI Report found that in 2025, 31% of the use cases studied reached full production, which is double the amount compared to our 2024 study. But expectations that AI would cut costs and boost productivity are underdelivering.
JPMorgan's consumer banking chief told investors that operations staff would fall by at least 10% over five years as AI scales. That's the flip side of those efficiency gains.
And infrastructure costs remain staggering for the providers. The furious push by AI hyperscalers to build out data centers will need about $1.5 trillion of investment-grade bonds over the next five years, according to JPMorgan strategists. The hyperscalers are on track to spend roughly $350 billion on AI infrastructure this year alone.
What this means
The 280-fold cost reduction changes the math for thousands of potential applications that were previously uneconomical. A startup processing 2 billion tokens monthly that once faced prohibitive costs can now operate for a fraction of that expense.
Code became AI's first true 'killer use case' as models reached economically meaningful performance, with Anthropic's Sonnet 3.5 triggering the category's initial breakout in mid-2024. Fifty percent of developers now use AI coding tools daily.
The question isn't whether AI costs will continue falling. The question is whether the applications will scale fast enough to justify the infrastructure buildout. To drive a 10% return on modeled AI investments through 2030 would require roughly $650 billion of annual revenue into perpetuity, according to JPMorgan's analysis.
Twenty-three percent of respondents report their organizations are scaling an agentic AI system somewhere in their enterprises, and another 39% are experimenting. That's early innings for systems that can execute multi-step workflows autonomously.
The next benchmark to watch: whether the price collapse translates into the enterprise EBIT improvements that remain elusive for most organizations. The infrastructure is cheaper. The talent pool is growing. The use cases are maturing.
The gap between what's technically possible and what's economically viable just got a lot smaller.




