Cornell Study: AI Boosts Science Output 50%, Quality Falls

Scientists using ChatGPT and similar AI tools are flooding preprint servers with new papers, according to research published December 18 in the journal Science. The productivity gains are real, sometimes exceeding 50%. But here's the uncomfortable finding buried in the data: journals are accepting these AI-polished papers at lower rates than human-written ones with similar linguistic complexity.

The numbers tell two stories

Cornell researchers analyzed more than 2 million papers posted to arXiv, bioRxiv, and the Social Science Research Network between January 2018 and June 2024. They built a detector to flag likely AI-assisted manuscripts by comparing writing patterns against text generated by GPT-3.5.

On arXiv, scientists who appeared to adopt LLMs posted roughly one-third more papers. BioRxiv and SSRN showed increases exceeding 50%. These aren't small sample artifacts. The pattern held across physical sciences, life sciences, and social sciences.

Non-native English speakers saw the most dramatic shifts. Researchers at Asian institutions posted between 43% and 89.3% more papers after the detector flagged them as likely LLM users, depending on the preprint platform. That's a substantial range, and the researchers don't fully explain the variance. But the direction is clear: the language barrier that has long disadvantaged researchers outside English-speaking countries is eroding.

"There's a big shift in our current ecosystem that warrants a very serious look," said Yian Yin, an assistant professor of information science at Cornell, "especially for those who make decisions about what science we should support and fund."

The quality problem nobody wants to talk about

For decades, peer reviewers used writing quality as a proxy for research quality. Complex, clearly written papers with sophisticated vocabulary tended to report better science. The correlation wasn't perfect, but it was consistent enough to be useful.

That heuristic is breaking down.

Papers flagged as likely AI-written that scored high on writing complexity tests were less likely to get accepted by journals than human-written papers with similar scores. The smooth prose no longer signals substance. Reviewers appear to be catching on, or perhaps they're picking up on something else entirely. The researchers don't claim to know why journals are rejecting these papers, only that they are.

"For peer reviewers and journal editors, and the community more broadly who create, consume, and apply this work, this represents a major issue," the study authors wrote. That's putting it mildly. If publication counts become decoupled from research quality, how do tenure committees evaluate candidates? How do funding agencies decide where to invest?

One potential upside

The study found that AI-powered search tools, specifically Bing Chat, helped researchers discover more diverse literature to cite. Traditional search tends to surface older, heavily-cited papers. LLM-powered search returned newer work and more books.

"People using LLMs are connecting to more diverse knowledge, which might be driving more creative ideas," said Keigo Kusumegi, a doctoral student and the study's first author. He plans to investigate whether AI use correlates with more interdisciplinary research. That's an optimistic read on the data, though the connection between citation diversity and actual innovation remains unproven.

What happens next

The Cornell team acknowledges their findings are observational. Correlation, causation, the usual caveats. They want to run controlled experiments where some researchers get assigned to use LLMs and others don't.

Yin is organizing a symposium on the Cornell campus scheduled for March 3-5, 2026, to examine how generative AI is reshaping research. The timing feels about right. By then, the tools will have evolved again, and the productivity numbers will probably look even more dramatic.

The question nobody in the study asks directly: if AI-assisted papers are easier to write but harder to publish, does the productivity gain actually matter? Preprints aren't the finish line. Journal acceptance is. And if reviewers are already developing antibodies to AI polish, the advantage may prove temporary.

Cornell Study: AI Boosts Science Output 50%, Quality Falls

The numbers tell two stories

The quality problem nobody wants to talk about

One potential upside

What happens next

Liza Chan

Related Articles

ETH Zurich Researchers Show LLMs Can Unmask Anonymous Online Users at Scale

Anthropic Study: The Better AI Output Looks, the Less People Bother Checking It

OpenAI Closes $110 Billion Round Led by Amazon, Nvidia, and SoftBank

Stay Ahead of the AI Curve