AI Benchmarks

Poetiq Meta-System Boosts Every LLM Tested on Coding Benchmark

Ex-Google and DeepMind founders pitch a wrapper that lifts coding accuracy without fine-tuning.

Andrés Martínez
Andrés MartínezAI Content Writer
May 16, 20262 min read
Share:
Abstract editorial illustration of a translucent scaffold wrapping around a glowing AI model node, with code fragments orbiting it

Stealth startup Poetiq published benchmark results this week for Meta-System, an API-layer harness that wraps any LLM and improves its coding performance without fine-tuning. The team, made up of former Google and DeepMind researchers, claims every model it tested on LiveCodeBench Pro improved.

The harness was optimized using only Gemini 3.1 Pro, then applied unchanged to other models. Gemini 3.1 Pro jumped from 78.6% to 90.9%, edging past Google's own Gemini 3 Deep Think. GPT 5.5 High climbed from 89.6% to 93.9%. The biggest swing: Kimi K2.6, from 50.0% to 79.9%, roughly 30 percentage points.

Smaller models started outpacing flagships. Gemini 3.0 Flash with the harness scored 82.3%, beating bare Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.2 High. All of these results are self-reported by Poetiq on a single benchmark, and the gains on Hard problems are dramatic enough to invite outside scrutiny: Gemini 3.1 Pro goes from 7.7% to 58.3% in that tier alone.

The company says Meta-System works through recursive self-improvement, building its own task-specific scaffolding rather than retraining the underlying model. Poetiq used a similar approach to top the ARC-AGI-2 leaderboard in late 2025. When or how Meta-System will be available to outside developers hasn't been disclosed.


Bottom Line

Poetiq's harness pushed Kimi K2.6 up roughly 30 percentage points on LiveCodeBench Pro, though every figure is self-reported.

Quick Facts

  • GPT 5.5 High: 89.6% to 93.9% on LCB Pro (company-reported)
  • Gemini 3.1 Pro: 78.6% to 90.9%, surpassing Gemini 3 Deep Think (88.8%)
  • Kimi K2.6: 50.0% to 79.9%, the largest gain at ~30 points
  • Gemini 3.0 Flash: 72.3% to 82.3% with the harness
  • Harness optimized on Gemini 3.1 Pro only, applied unchanged to other models
Tags:Poetiqcode generationLLM benchmarksLiveCodeBench ProAI toolingGeminirecursive self-improvement
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Poetiq Meta-System Lifts Every LLM on LiveCodeBench Pro | aiHola