Miami startup Subquadratic came out of stealth Tuesday with $29 million in seed funding and SubQ, a model the company says runs roughly 52 times faster than FlashAttention at one million tokens and supports a 12 million token context window. CTO Alex Whedon framed it in a launch post on X as the first frontier model to break the quadratic attention barrier.
Subquadratic's pitch is aggressive. At 12M tokens, the company announcement claims its architecture cuts attention compute by nearly 1,000x compared with other frontier models. At one million tokens, it says SubQ costs roughly a fifth of what Claude Opus 4.7 or GPT-5.5 charge for comparable workloads. On RULER 128K, the company puts the cost gap at about 300x.
The architectural claim
The technique is called Subquadratic Sparse Attention, or SSA. The premise: most token-to-token comparisons in standard attention are wasted compute, so let the model learn which positions actually matter and compute attention only over those. Selection is content-dependent, not based on fixed positional patterns like older sparse-attention work.
If that sounds familiar, it should. Mamba, RWKV, Longformer, DeepSeek Sparse Attention, Kimi Linear. A decade of attempts to escape O(n²) attention. None has displaced dense attention at the frontier.
About those weights
Within hours of launch, AI engineer Will Depue posted that SubQ was "almost surely a sparse attention finetune of Kimi or DeepSeek." Whedon then confirmed it. The company is "using weights from open-source models as a starting point, as a function of our funding and maturity as a company," he wrote on X. Depue followed up arguing the O(n) scaling claims and the reported speedups "don't seem to line up."
That admission complicates the framing. A sparse-attention layer grafted onto somebody else's pretraining run is a legitimate engineering contribution. It is not the same thing as a ground-up redesign of how attention works, which is roughly how the company's launch coverage described it.
The benchmarks, with caveats
The published numbers look strong on paper. 95% on RULER 128K. 81.8% on SWE-Bench Verified, edging Opus 4.6. 92.1% on needle-in-a-haystack at 12M tokens. On MRCR v2 at one million tokens, though, SubQ scores 65.9%, behind GPT-5.5's 74%. According to coverage in The New Stack, each model was run only once due to inference cost, and Whedon himself described SubQ as "way smaller than the big labs."
One run per benchmark on a model the team admits is sub-frontier sized. That is not a fraud claim. It is a reason to wait for independent reproduction before declaring a breakthrough.
We've been here
The company that should be on every reader's mind is Magic.dev. In August 2024, Magic announced a 100M-token context model with claimed 1,000x efficiency gains, and went on to raise more than $500 million on the strength of those numbers. Nearly two years later, there is no public evidence that LTM-2-mini is in production use outside Magic. VentureBeat draws the parallel directly, and Subquadratic's reported $500 million valuation on a seed round, with no public weights and no peer-reviewed paper, looks like the same movie restarting.
What's next
Subquadratic is taking access requests for three products in private beta: an API exposing the full 12M window, SubQ Code (a CLI agent that loads whole repos into context), and SubQ Search, free during beta. The 50 million token context target is set for Q4. A full technical report has not been released, and the model weights are not open. Until that report lands and outside labs can rerun the benchmarks, AI commentator Dan McAteer's read in his own X post is hard to argue with: "either the biggest breakthrough since the Transformer... or it's AI Theranos."




