Google DeepMind shipped Gemini Robotics in March 2025, a vision-language-action model that outputs motor commands directly from Gemini 2.0's multimodal backbone. By September, the company released Gemini Robotics-ER 1.5 with chain-of-thought reasoning for robots. That's a six-month iteration cycle on models that let robots handle tasks they've never seen in training.
The old robotics math doesn't work anymore.
The training data cliff
Before VLAs, teaching a robot to pick up a coffee mug required thousands of demonstrations. The same robot couldn't pick up a wine glass without starting over. Lighting changed? Start over. Table height shifted by two centimeters? You get the idea.
Google's On-Device model, released in the summer, learns new behaviors with 50 to 100 demonstrations. Physical Intelligence's π0 shows similar efficiency gains in its fine-tuning requirements. The OpenVLA project notes that its model "typically requires fine-tuning on a small demonstration dataset (~100 demos) from your target domain robot" to achieve useful performance.
A hundred demos instead of ten thousand isn't a 10x improvement. It rewrites what's economically viable to automate. Tasks that would have cost more to program than they saved in labor suddenly pencil out. Warehouse operations that couldn't justify custom robot programming can now fine-tune a foundation model in days.
The money is following
Physical Intelligence raised $600 million in November 2025 at a $5.6 billion valuation, led by Alphabet's CapitalG. The San Francisco company, founded in 2024 by former Google DeepMind researchers, hasn't shipped a commercial product. Its robots fold laundry and assemble boxes in demo videos.
Figure AI reached a $39 billion valuation in September 2025, raising over $1 billion in its Series C. The company ended its OpenAI partnership in February 2025, with CEO Brett Adcock explaining that "to solve embodied AI at scale in the real world, you have to vertically integrate robot AI." They're building Helix, a proprietary VLA, rather than licensing someone else's intelligence layer.
Boston Dynamics partnered with Toyota Research Institute in October 2024 specifically to add Large Behavior Models to Atlas. The partnership announced results in August 2025: Atlas performing packing and sorting tasks using LBMs, with new capabilities "added quickly and without writing a single new line of code." Scott Kuindersma, now vice president of Robotics Research at Boston Dynamics, said the approach will "lead to better generalization" for robots requiring "whole-body precision, dexterity, and strength."
The pattern here is worth noticing. Hardware companies are partnering or acquiring their way into the intelligence layer. Software companies are treating hardware as a platform for their models.
Google's structural advantage
Googlet trained Gemini on the largest multimodal corpus ever assembled. The Gemini Robotics models add physical actions as an output modality to that existing capability. Gemini Robotics-ER specializes in spatial reasoning, task planning, and progress estimation. The VLA model maps "high-level instructions and visual observations directly to motor commands," according to Google's documentation.
The on-device version, Gemini Robotics On-Device, runs locally with low-latency inference. DeepMind tested it on ALOHA dual-arm robots, then adapted it to Franka FR3 arms and Apptronik's Apollo humanoid with "fewer than 100 demonstrations" for new task adaptation.
That cross-embodiment capability is the leverage point. A single foundation model that works on industrial arms, humanoids, and mobile manipulators compounds faster than models locked to specific hardware. Every deployment generates data that improves the foundation model, which improves all deployments.
Physical Intelligence is pursuing the same strategy with its π0 model. The company describes it as creating a "brain" that can "power any robot or any physical device basically for any application." Their business model is Robot-as-a-Service at roughly $1,000 per month per connected robot, according to industry analysis, with model weights available for fine-tuning.
Hardware is becoming table stakes
Boston Dynamics spent a decade perfecting locomotion. Atlas did backflips in November 2017. The electric version revealed in 2024 has capabilities that still lead the industry. But the company needed TRI's Large Behavior Models to add general-purpose manipulation. Their own hardware excellence wasn't sufficient.
Figure AI's $39 billion valuation looks aggressive for a company with limited production deployments. But the bet makes more sense if you assume the humanoid form factor commoditizes and the intelligence layer captures most of the value. Chinese manufacturers like Unitree are shipping humanoid robots for under $10,000 at the hardware level.
The companies pricing this correctly are building around foundation model access. The companies pricing this wrong are still acting like the moat is in the mechanical engineering.
What hasn't shipped yet
The demos are impressive. Gemini Robotics models prepare salads, fold origami, play Tic-Tac-Toe. Physical Intelligence's robots assemble boxes and make espresso. But the production deployments remain limited. Figure AI reports 11 months running at BMW's Spartanburg plant with Figure 02, loading "over 90,000 parts" according to company statements. That's meaningful but not yet the mass deployment the valuations imply.
VLA models still struggle with highly variable environments. OpenVLA's documentation acknowledges the model "only works well on domains from the training dataset" without fine-tuning. The generalization that makes headlines in labs doesn't always survive contact with factories that have slightly different lighting or slightly heavier boxes.
The data flywheel theory, that more deployments feed more data back to improve the foundation model, remains largely theoretical. Physical Intelligence is investing heavily in data collection partnerships, including one with AgiBot for manufacturing data. But the closed-loop improvement that made language models so powerful hasn't been proven at scale for robots.
The timing question
Foundation model development runs on six-month to one-year cycles. Humanoid hardware development runs on three-year cycles. That timing mismatch is the entire trade.
If the VLA models continue improving at their current pace, the intelligence layer will be years ahead of the hardware by the time most humanoid robots reach production. The robots will be the commodity; the models will be the differentiation.
If the models hit a ceiling, if real-world deployment proves harder than lab demos suggest, or if the data requirements for true generalization turn out to be much larger than current fine-tuning numbers imply, then hardware quality and manufacturing scale still matter.
Physical Intelligence's $5.6 billion valuation and Figure AI's $39 billion valuation are bets on the first scenario. Boston Dynamics and Toyota Research Institute's partnership hedges both outcomes.
The next demonstration that matters isn't another origami-folding video. It's a deployment that runs for years, not months, on tasks that weren't in the training data.




