AI Coding Tools May Cost More Than They Save, Research Shows

A team of researchers at the University of Texas, Dartmouth College, and the University of New Mexico has published findings in MIT Sloan Management Review suggesting that the rush to adopt AI coding tools is creating a technical debt crisis in enterprise software. The analysis, published in August 2025, draws on interviews with developers across insurance, fintech, defense, and other industries, combined with GitClear data covering 211 million lines of code.

The productivity numbers everyone cites

The case for AI coding tools rests on two widely quoted studies. GitHub's research found developers using Copilot completed a coding task 55% faster, finishing in 71 minutes versus 161 minutes for the control group. McKinsey's study reported developers completing tasks up to twice as fast with generative AI assistance.

Both figures come with asterisks. GitHub's experiment involved 95 developers implementing an HTTP server in JavaScript, a clean greenfield task with no legacy dependencies. McKinsey's developers worked on standardized tasks like refactoring code into microservices and documenting code functionality, again in controlled conditions. Neither study measured what happens when AI-generated code lands in a 20-year-old codebase held together with duct tape and tribal knowledge.

The MIT Sloan researchers found that distinction matters enormously.

What the code quality data actually shows

GitClear's analysis of code changes from 2020 through 2024 paints a different picture than the productivity headlines suggest. The percentage of code associated with refactoring dropped from 25% of changed lines in 2021 to less than 10% in 2024, while copy-pasted code rose from 8.3% to 12.3%.

The duplication problem is accelerating. GitClear found an 8-fold increase in code blocks with five or more duplicated lines during 2024. That same year marked a threshold: copy-pasted lines exceeded moved lines for the first time, a metric GitClear uses to track refactoring activity. Developers are increasingly accepting AI suggestions wholesale rather than consolidating functions into reusable modules.

API evangelist Kin Lane put it bluntly in commenting on the data: "I don't think I have ever seen so much technical debt being created in such a short period of time during my 35-year career in technology."

Google's own research corroborates the concern. The 2024 DORA report found that as AI adoption increased, delivery stability decreased by an estimated 7.2% for every 25% increase in AI usage. Developers reported feeling more productive while their actual delivery metrics worsened, a finding the DORA team called "a bit of a WTF" in their analysis.

The junior developer problem

One finding from the MIT Sloan research deserves more attention than it's getting. A developer at a Fortune 50 tech company's AI infrastructure division told the researchers: "[With AI] a junior engineer can write as fast as a senior engineer, but they don't have the cognitive sense of what they're doing… or what problems they're causing… or even if it's a good idea to do what they're doing."

This isn't just a training issue. AI coding tools excel at producing syntactically correct code that passes basic tests but creates integration nightmares. They can't see how a new function fits into an existing architecture, whether it duplicates logic elsewhere in the codebase, or whether it introduces dependencies that will cause problems during the next upgrade cycle.

The researchers note that highly skilled developers are better equipped to recognize architectural flaws and mitigate technical debt before it spreads. But the economic logic of AI tools pushes in the opposite direction: if junior developers can produce code at senior velocity, why hire seniors?

We've seen this movie before

Technical debt has a way of compounding until it triggers spectacular failures. The Y2K crisis, caused by developers in the 1960s and 1970s saving memory by storing two-digit years, cost an estimated $300 billion globally to remediate. At the time those shortcuts seemed sensible. Storage was expensive. The year 2000 was decades away.

Southwest Airlines' 2022 holiday meltdown offers a more recent case study. The airline's crew scheduling system, developed decades earlier and patched repeatedly, couldn't handle a cascading failure triggered by winter weather. More than 16,900 flights were canceled, stranding over 2 million passengers. The total cost exceeded $750 million, including a $140 million civil penalty from the Department of Transportation.

Southwest's technical debt accumulated gradually. The number of full-time tech workers at the airline declined by 27% from 2018 to 2021 while overall employment fell just 6%. The Southwest Airlines Pilots Association warned in November 2022, one month before the meltdown, that the company was "one IT router failure away from a complete meltdown."

The MIT Sloan researchers argue AI-generated code represents borrowing at a higher interest rate than traditional shortcuts. An engineer at a major AI company told them: "AI can't see what your code base is like, so it can't adhere to the way things have been done."

What organizations are supposed to do about it

The MIT Sloan paper offers the usual prescriptions: establish clear guidelines, prioritize technical debt management, train developers to use AI responsibly. None of this is wrong. None of it addresses the fundamental incentive problem.

McKinsey's research found developers using AI tools were more than twice as likely to report happiness and fulfillment. Google's DORA survey showed productivity gains alongside the stability declines. The individual developer experience is genuinely better, even as the systemic outcomes worsen.

As one analysis of the GitClear data noted: "Even when managers focus on more substantive productivity metrics, like 'tickets solved' or 'commits without a security vulnerability,' AI can juice these metrics by duplicating large swaths of code in each commit."

The researchers recommend that code reviews evolve beyond evaluating code quality to coaching junior developers in responsible AI use. Senior developers, in this model, become guardrails against both immediate code problems and the erosion of foundational skills in the next generation.

Whether organizations will actually invest in that mentorship infrastructure, given the pressure to capture AI productivity gains, remains the open question.