The AI Alliance, a non-profit coalition of more than 200 organizations, has launched Project Tapestry, a platform meant to let institutions and nations jointly train open foundation models without ever pooling their raw data. The group announced it in early April, with Turing Award winner Yann LeCun signing on as chief science advisor.
The pitch is straightforward enough. Partners train on their own machines, against their own data, and the only thing that leaves the building is updated model weights. The sensitive stuff stays home.
How it actually works
Each partner runs a node. Local data sits at that node and gets used to compute weight updates, which then travel to a central core for aggregation. The corpus itself never ships. The Alliance calls the architecture N+1: one shared base model at the center, sovereign nodes around it, weights flowing in and improved models flowing back out.
That is the part worth taking seriously. Federated learning has been around for years, mostly in phone keyboards and hospital data, and the open question has always been whether you can do it at frontier scale without the coordination cost eating the benefit. Tapestry's own materials lean on recent work suggesting globally federated training can match synchronous baselines. Demonstrating that at scale is Phase 0, which is happening now, with the distributed training framework being built in the open on GitHub.
What LeCun is promising
LeCun's framing is that AI is becoming common infrastructure, and infrastructure controlled by a handful of companies in a handful of regions is a problem. "Most of the world downloads the result," reads the project page, paraphrasing his point. "Almost no one shapes the process." It is a clean line, and also the kind of thing that is easier to say than to engineer.
The promise to contributors is concrete: anyone who helps train the base model gets access to it and the right to build their own derivative they fully own. That ownership hook is what the Alliance is betting will pull in governments, national HPC centers, and universities sitting on multilingual datasets nobody else has.
The timeline, and where it gets vague
Here is the roadmap as published. A training data catalog and initial commitments are the Phase 0 and Phase 1 work through mid-2026, with a version-one training platform targeted for September. A first base model, trained small and from scratch, is slated for end of 2026 alongside the first sovereign derivatives. Early deployment with industry and government use cases lands in the first half of 2027.
Then comes Phase 4, "frontier-scale effort," pinned to summer 2027 and beyond. The roadmap is honest about this one in a way press summaries tend to smooth over: it is contingent on compute, data, and capital thresholds. So the claim circulating that Tapestry will ship something rivaling proprietary state-of-the-art models by summer 2027 is reading the optimistic gloss, not the document. The document says they will attempt it if the money and the GPUs show up.
Which is the whole ballgame. A consortium of 200 organizations can catalog datasets and run multi-node experiments without much trouble. Assembling enough committed compute to train a genuinely frontier model, federated across borders and institutions with different incentives, is a coordination problem nobody has solved yet. The science being ready, as the Alliance puts it, is not the same as the logistics being ready.
The next checkpoint is the version-one platform in September. If the multi-node aggregation framework works and real partners commit compute by then, the rest of the roadmap stops being a wishlist.




