Poetiq Meta-System Lifts Every LLM on LiveCodeBench Pro

Abstract editorial illustration of a translucent scaffold wrapping around a glowing AI model node, with code fragments orbiting it

Stealth startup Poetiq published benchmark results this week for Meta-System, an API-layer harness that wraps any LLM and improves its coding performance without fine-tuning. The team, made up of former Google and DeepMind researchers, claims every model it tested on LiveCodeBench Pro improved.

The harness was optimized using only Gemini 3.1 Pro, then applied unchanged to other models. Gemini 3.1 Pro jumped from 78.6% to 90.9%, edging past Google's own Gemini 3 Deep Think. GPT 5.5 High climbed from 89.6% to 93.9%. The biggest swing: Kimi K2.6, from 50.0% to 79.9%, roughly 30 percentage points.

Smaller models started outpacing flagships. Gemini 3.0 Flash with the harness scored 82.3%, beating bare Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.2 High. All of these results are self-reported by Poetiq on a single benchmark, and the gains on Hard problems are dramatic enough to invite outside scrutiny: Gemini 3.1 Pro goes from 7.7% to 58.3% in that tier alone.

The company says Meta-System works through recursive self-improvement, building its own task-specific scaffolding rather than retraining the underlying model. Poetiq used a similar approach to top the ARC-AGI-2 leaderboard in late 2025. When or how Meta-System will be available to outside developers hasn't been disclosed.

Bottom Line

Poetiq's harness pushed Kimi K2.6 up roughly 30 percentage points on LiveCodeBench Pro, though every figure is self-reported.

Quick Facts

GPT 5.5 High: 89.6% to 93.9% on LCB Pro (company-reported)
Gemini 3.1 Pro: 78.6% to 90.9%, surpassing Gemini 3 Deep Think (88.8%)
Kimi K2.6: 50.0% to 79.9%, the largest gain at ~30 points
Gemini 3.0 Flash: 72.3% to 82.3% with the harness
Harness optimized on Gemini 3.1 Pro only, applied unchanged to other models

Tags:Poetiqcode generationLLM benchmarksLiveCodeBench ProAI toolingGeminirecursive self-improvement

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Poetiq Meta-System Boosts Every LLM Tested on Coding Benchmark

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

New Chronicles-OCR benchmark catches frontier vision models scoring near zero on ancient Chinese scripts

Google's Gemini 3.5 Flash Beats Its Own Flagship at I/O 2026

Cursor edges Claude Code and Codex in new cross-stack coding agent benchmark

Stay Ahead of the AI Curve