A Tokyo AI startup put OpenAI's latest reasoning model through this year's entrance exams at Japan's two most selective universities. ChatGPT 5.2 Thinking outscored the best human applicants at both, according to numbers LifePrompt released Monday.
The numbers
LifePrompt fed exam questions to the model as image data and had teachers from cram school Kawai Juku grade the essay portions. On the University of Tokyo's Natural Sciences III track, the medical course that draws Japan's hardest-working applicants, the model scored 50 points above the top admitted student and pulled a perfect mark in math.
Across the broader exams, ChatGPT landed 503 out of 550 in Natural Sciences against a human top of 453, and 452 out of 550 in Humanities and Social Sciences against 434. At Kyoto University the gaps stayed wide: 771 in the law faculty against 734, and 1,176 in medicine against 1,098.
Where it broke down
Math, chemistry, the structured stuff: nailed. English came in around 90 percent. Then the model cratered on essay-style World History at roughly 25 percent, a gap big enough to suggest something specific about what these tests still measure. LifePrompt founder Satoshi Endo has said before that AI continues to struggle with Japanese-language essay writing. The 25 percent is rough confirmation.
How much should this convince anyone?
The methodology deserves a pause. LifePrompt is an AI consulting firm whose business case rests on companies adopting AI faster. The essays were graded by cram school teachers, not university faculty. Past exam papers are public after the fact, so training-data contamination on parts of the test is at least plausible. None of which makes the top-line numbers wrong. Just worth holding lightly.
Endo's takeaway, via the Kyodo report, was that "companies will need to adopt AI with an eye toward how business operations will look in 10 to 20 years." Which lands differently when you're selling AI consulting. The pushback came from Keio University professor Satoshi Kurihara, who chairs the Japanese Society for Artificial Intelligence. "Just as calculators can perform calculations faster and more accurately than humans can, it is only natural for AI to earn high scores," he said, before suggesting the entrance exams themselves need rethinking.
Three years up
LifePrompt has run this experiment annually. In 2024, GPT-4 couldn't clear the University of Tokyo's minimum passing score. In 2025, o1 just barely got over the line. This year, top of the class. The trajectory matters more than the headline.
Kurihara's point about reworking the exams is probably the part to watch. If sections that take Japanese teenagers years to prepare for can be solved by an LLM reading photos of the paper, the test is measuring something other than what it used to. Expect that argument to surface before the next admissions cycle in early 2027.




