Arc Institute dropped its first virtual cell model on June 23rd. They're calling it State, and the pitch is straightforward: show it cells treated with a drug, and it predicts how completely different cells would respond to that same drug. Even if nobody ever ran that experiment.
This is the thing that wasn't supposed to work yet.
Why this matters more than the last ten bio-AI announcements
Every foundation model in biology has hit the same wall. Train it on one disease, it only works on that disease. Want to predict a new drug's effect? Retrain. New tissue type? Retrain. Different patient population? You get it.
State supposedly sidesteps this by learning from groups of cells instead of processing them individually. The preprint describes a bidirectional transformer that uses attention across entire cell populations, which lets it capture patterns from the full distribution rather than treating each cell as an isolated data point.
The analogy to language models is obvious and probably intentional: words get meaning from context, cells get meaning from other cells. Whether that analogy holds up to scrutiny is another question.
The numbers
State was trained on observational data from 167 million cells plus perturbational data from over 100 million cells across 70 cell lines. That's a lot. The perturbational data comes from the Tahoe-100M dataset, which Tahoe Therapeutics open-sourced earlier this year as part of Arc's Virtual Cell Atlas.
The benchmark claims: 50% improvement in distinguishing perturbation effects on Tahoe-100M, and double the accuracy in identifying differentially expressed genes compared to existing models. Arc also says State is the first model to consistently beat simple linear baselines, which, if true, is actually more interesting than the flashier numbers. Linear models have been embarrassingly competitive in this space.
The comparison baseline is their own previous work and existing computational approaches. Not against wet-lab validation across a broad range of conditions. That part comes later, presumably.
How it actually works
Two interlocking modules. The State Embedding model converts transcriptome data into a vector space where similar cell types cluster together. The State Transition model then predicts how those embeddings shift when you apply a perturbation.
So you feed it a starting transcriptome plus a perturbation (drug, genetic mutation, whatever), and it outputs predicted changes in RNA expression. The model operates in embedding space rather than raw gene expression counts, which Arc says makes it more robust to technical noise.
The code is on GitHub, licensed for noncommercial use.
What's missing
I couldn't find validation against real wet-lab experiments for the zero-shot predictions. The benchmarks compare computational approaches, not predictions versus actual biological outcomes in truly novel contexts. That's the hard part.
The preprint focuses on single-cell RNA sequencing data. Arc's Hani Goodarzi has been clear that this is the starting point because it's the only unbiased single-cell data they can generate at scale. Proteomics, epigenetics, spatial transcriptomics, all of that comes later. Maybe.
And there's the standard question with any foundation model in biology: does transfer actually work the way they're claiming? The field has been burned before by models that looked great on held-out test sets and fell apart on genuinely novel conditions.
The bigger picture
Arc is framing this as their "GPT-1 moment" for cell biology. The comparison is ambitious but not crazy. They've been building toward this: Evo 2 for genomics earlier this year, the Virtual Cell Atlas infrastructure, the scBaseCount data curation system. State slots into that stack.
Patrick Hsu, Arc's co-founder, has been talking about using perturbational data to capture causality rather than just correlation. The idea is that observational data tells you genes A and B are related, but perturbational data, actually knocking out a gene or applying a drug, tells you which one causes what. State was explicitly designed to leverage this.
The drug discovery implications are obvious. Screen computationally before spending millions on wet-lab experiments. Predict off-target effects. Model patient-specific responses. All the usual promises.
What happens next
Arc is positioning State as the first in a series. As training data grows, so does accuracy, at least that's the claim. They demonstrated scaling laws for DNA language modeling last year with Evo, so there's precedent.
The Virtual Cell Challenge, which Arc launched in June, will provide external validation. Over 5,000 people registered, more than 1,200 teams submitted results. The initial findings from that challenge were mixed: perturbation prediction models aren't yet consistently outperforming naive baselines across all metrics.
So we're somewhere between "this is a genuine breakthrough" and "this is a well-executed incremental step." The zero-shot generalization claim is the key thing to watch. If State actually predicts responses in tissue types and perturbation combinations it was never trained on, that changes how drug discovery works. If it doesn't, it's a very good perturbation model with better marketing than its predecessors.
The preprint is up. The code is available. Now we wait for people to break it.




