LLMs & Foundation Models

DeepSeek's FlashMLA Updates Reveal MODEL1, Hinting at New Architecture Beyond V3

Code analysis shows MODEL1 as an independent branch, not a V3.2 derivative, with support for Nvidia's upcoming Blackwell chips.

Andrés Martínez
Andrés MartínezAI Content Writer
January 21, 20264 min read
Share:
Abstract visualization of diverging AI model architectures represented as parallel light streams

Developers digging through DeepSeek's FlashMLA repository on January 20 found something unexpected: a model identifier called "MODEL1" appearing 28 times across 114 files, sitting alongside "V32" as a separate entity. The timing is hard to ignore. January 20 marked exactly one year since DeepSeek dropped R1, the reasoning model that briefly wiped $593 billion from Nvidia's market cap.

What the code actually shows

The FlashMLA codebase is DeepSeek's library of optimized attention kernels for inference. It powers V3 and V3.2-Exp. When MODEL1 appeared in the commits, it wasn't tagged as a variant of V32. The code treats them as parallel architectures.

Specific differences stand out in the logic paths. KV cache layout diverges between the two. Sparsity handling follows different rules. FP8 decoding takes a separate path. These aren't the kinds of changes you'd see in a minor version bump. They're architectural.

The Reddit LocalLLaMA community spotted additional details: MODEL1 includes compatibility with Nvidia's SM100 architecture (Blackwell), not just the current Hopper chips. There's also mention of "Value Vector Position Awareness" and a reversion to a unified 512-standard dimension.

Engram integration looks likely

Two weeks ago, DeepSeek published a paper on Engram, a conditional memory module that fundamentally changes how transformers handle knowledge retrieval. Instead of forcing the model to reconstruct common patterns through expensive computation, Engram provides O(1) lookups for static knowledge.

The practical impact: when a model encounters "Diana, Princess of Wales," it doesn't have to burn multiple attention layers figuring out what that phrase means. It just looks it up. DeepSeek's testing showed their Engram-27B model improving Needle-in-a-Haystack accuracy from 84.2% to 97%.

The Engram paper was co-authored by founder Liang Wenfeng. When the CEO puts his name on research, it usually ends up in production.

Whether MODEL1 incorporates Engram remains unconfirmed. But the timing is suggestive. DeepSeek has a pattern of publishing foundational research shortly before major model releases. They did the same thing with R1.

The naming question

If MODEL1 represents a new architecture rather than a V3 derivative, the obvious question is what they'll call it. V4 is the speculation, and it fits DeepSeek's convention. But "MODEL1" as an internal codename could mean anything.

What's clear is that this isn't V3.3 or V3.2.1. The code structure treats MODEL1 and V32 as distinct branches with their own inference paths. You don't build separate GPU architecture support for a minor update.

February release window

DeepSeek has a tradition of dropping major announcements around Lunar New Year. V3 and R1 both came during the Spring Festival window last year. February 17 is the 2026 Lunar New Year. Industry observers have been pointing to mid-February for weeks.

The company hasn't confirmed anything. DeepSeek tends toward operational silence punctuated by sudden releases. But the constellation of signals is getting dense: Engram research published, FlashMLA updates with MODEL1 references, R1's first anniversary, and the approaching holiday window.

Internal benchmarks reportedly show whatever comes next outperforming Claude and GPT on coding tasks, particularly with long context prompts. That's an unverified claim from DeepSeek employees, worth treating with appropriate skepticism until independent testing happens.

What this means for the market

The R1 release last January triggered genuine market panic. A Chinese lab matching frontier capabilities at 1/20th the training cost challenged assumptions about compute moats and export control effectiveness. If V4 delivers on the Engram architecture's promise of efficient long-context handling, it could compress the gap further.

DeepSeek has been releasing models at a pace that makes Western labs uncomfortable. V3 in December 2024, R1 in January 2025, R1-0528 in May, V3.1 in August, V3.2 in December. Each one slightly closer to or matching the frontier. The company reportedly trained R1 for under $6 million and from V3-Base to R1 for $294,000.

Claude Opus 4.5 currently leads SWE-bench Verified at 80.9%. That's the benchmark V4 would need to beat for coding dominance claims to stick. V3.2 already demonstrated gold-medal performance on the 2025 IOI and ICPC World Finals without targeted training, so DeepSeek has the foundation.

Hardware requirements remain the wildcard. If V4 follows V3's mixture-of-experts architecture (671B total parameters, 37B active), it'll still demand serious VRAM. Quantized versions running on dual RTX 4090s or the new 5090 are the hope for local deployment enthusiasts. Whether that's realistic depends on what architectural changes MODEL1 actually contains.

DeepSeek hasn't commented on any of this. The company's communications strategy consists of publishing papers and pushing code. The FlashMLA commits are the message.

Tags:DeepSeekV4MODEL1FlashMLAEngramAI modelsopen source AIChinese AI
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

DeepSeek's FlashMLA Updates Reveal MODEL1, Hinting at New Architecture Beyond V3 | aiHola