Stealth startup Poetiq published benchmark results this week for Meta-System, an API-layer harness that wraps any LLM and improves its coding performance without fine-tuning. The team, made up of former Google and DeepMind researchers, claims every model it tested on LiveCodeBench Pro improved.
The harness was optimized using only Gemini 3.1 Pro, then applied unchanged to other models. Gemini 3.1 Pro jumped from 78.6% to 90.9%, edging past Google's own Gemini 3 Deep Think. GPT 5.5 High climbed from 89.6% to 93.9%. The biggest swing: Kimi K2.6, from 50.0% to 79.9%, roughly 30 percentage points.
Smaller models started outpacing flagships. Gemini 3.0 Flash with the harness scored 82.3%, beating bare Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.2 High. All of these results are self-reported by Poetiq on a single benchmark, and the gains on Hard problems are dramatic enough to invite outside scrutiny: Gemini 3.1 Pro goes from 7.7% to 58.3% in that tier alone.
The company says Meta-System works through recursive self-improvement, building its own task-specific scaffolding rather than retraining the underlying model. Poetiq used a similar approach to top the ARC-AGI-2 leaderboard in late 2025. When or how Meta-System will be available to outside developers hasn't been disclosed.
Bottom Line
Poetiq's harness pushed Kimi K2.6 up roughly 30 percentage points on LiveCodeBench Pro, though every figure is self-reported.
Quick Facts
- GPT 5.5 High: 89.6% to 93.9% on LCB Pro (company-reported)
- Gemini 3.1 Pro: 78.6% to 90.9%, surpassing Gemini 3 Deep Think (88.8%)
- Kimi K2.6: 50.0% to 79.9%, the largest gain at ~30 points
- Gemini 3.0 Flash: 72.3% to 82.3% with the harness
- Harness optimized on Gemini 3.1 Pro only, applied unchanged to other models




