Dimensional OS Adds Spatial Memory to Unitree G1 Humanoid

Dimensional, the open-source robotics framework built by Stash Pomichter, announced what it calls Temporal-Spatial Agents: a system that lets robots ingest hours of video and LiDAR data, tag every voxel with vector embeddings, and answer causal queries about physical space. The first demo runs on Unitree's G1 humanoid. It also works on quadrupeds like the Unitree Go2 and, according to the team, most commercial drones.

What the spatial memory actually does

The core of the announcement is something Dimensional calls Spatial Agent Memory and SpatialRAG. Think of it as retrieval-augmented generation, but instead of searching text documents, the robot searches a 3D map of the world tagged with semantic metadata. Every voxel in the robot's spatial representation gets labeled with vector embeddings for objects, geometry, time, and whatever additional context the developer wants to attach.

The GitHub repo shows the underlying architecture: AI agents sit on top of a skills abstraction layer that talks to robots through ROS2. Sensor data flows in through reactive streams (built on RxPY), and agents can call navigation primitives, manipulation skills, and perception tools as functions. Spatial memory persists in ChromaDB.

The practical pitch: a robot that can answer questions like "where did I last see the car keys," "who came to the door last Monday," or "when does the garbage get taken out." Dimensional frames these as examples of what spatial and temporal grounding enables. Whether current vision-language models are reliable enough to actually deliver on those promises in a home environment is a separate question the demo doesn't address.

So how does this differ from existing robot stacks?

Most robotics frameworks treat navigation and perception as separate pipelines. You get SLAM for mapping, a planner for paths, maybe object detection bolted on top. Dimensional's bet is that wrapping everything in an LLM agent layer, where the robot reasons in natural language about its spatial context, produces more flexible behavior.

The company's X account describes it as programming "atoms, not bits." The idea is you tell the humanoid in plain English: walk to the door, look through the peephole, let a friend in, trigger an alarm if it's a stranger. The agent decomposes that into a plan, calls the appropriate skills, and uses spatial memory to execute.

That sounds impressive in a demo video. It also sounds like exactly the kind of thing that fails catastrophically when the lighting changes or someone moves the furniture. The Dimensional website shows polished demos across navigation, memory, manipulation, and drone control, but production reliability data is conspicuously absent.

Why the Unitree G1

Unitree's G1 has become the de facto affordable humanoid research platform. It starts around $16,000 for the base model, stands about 127 cm tall, weighs 35 kg, and packs up to 43 degrees of freedom depending on configuration. It runs on an NVIDIA Jetson Orin with 100 TOPS of compute, which is enough to run inference on-device for most current VLMs.

Dimensional had already shipped a Go2 quadruped integration. Expanding to the G1 humanoid and drones signals an ambition toward what the team calls "cross-embodied Dimensional Applications," meaning the same agent code should work across different robot bodies. The framework uses a skills abstraction layer to make this possible, though real-world cross-embodiment remains one of robotics' hardest unsolved problems.

The open-source angle

Everything ships as open source. The main DimOS repo on GitHub currently sits at around 95 stars, which is modest. The Unitree-specific repo includes Docker configurations for development, simulation bindings for both Genesis and Isaac Sim, and supports multiple LLM backends (OpenAI, Anthropic, Alibaba).

One detail that caught my attention: the agent architecture supports chaining multiple agents together via reactive streams. A planning agent reasons about the task, passes steps to an executor agent, which queues robot commands. It's a pattern borrowed from software agent frameworks, now applied to physical manipulation and navigation. Whether the latency and reliability hold up when you're controlling a 35 kg machine walking around a house, rather than making API calls, remains to be tested at scale.

Dimensional is currently in version 0.0.5 and accepting beta access requests through its website.