Vitalik Buterin's Local LLM Setup: No Cloud, No Compromise

Ethereum co-founder Vitalik Buterin published a detailed blog post on Wednesday outlining the AI setup he runs entirely on local hardware, with no cloud dependencies, custom sandboxing, and a rule that no message or crypto transaction leaves his machine without human sign-off. The post is less a product recommendation than a manifesto: Buterin thinks the AI industry is about to undo years of privacy progress, and he's building his own alternative.

His timing isn't accidental. The rise of AI agents, particularly OpenClaw, has created a security landscape that ranges from sloppy to actively hostile. Buterin opens the post by citing research from cybersecurity firm Gen, which found that roughly 15% of community-built skills on OpenClaw's ClawHub marketplace contained malicious instructions. Some silently exfiltrated user data. Others deployed reverse shells. The numbers from independent audits are worse: Aikido Security documented hundreds of crypto-stealing packages, with security researcher Paul McCarty finding 386 malicious packages from a single threat actor within minutes of looking.

The OpenClaw problem

OpenClaw is the fastest-growing GitHub repository in history, and by most accounts, a security disaster. The agentic AI framework lets an LLM autonomously execute tasks using hundreds of tools, which is the whole appeal. But Dark Reading reported that agents can modify their own system prompts, add communication channels, and execute code without asking the user. Cisco's security team ran a third-party skill called "What Would Elon Do?" against OpenClaw and found it silently sent data to an external server via a curl command. The network call was invisible to the user.

Researchers at Oasis Security found something arguably worse: any website a developer visits can silently take over their local OpenClaw instance through a WebSocket connection to localhost. The gateway's rate limiter exempts local connections entirely. No plugins needed, no user interaction required.

Buterin is blunt about his reaction. "I come from a mindset of being deeply scared that just as we were finally making a step forward in privacy with the mainstreaming of end-to-end encryption and more and more local-first software, we are on the verge of taking ten steps backward," he wrote.

What he actually built

The setup runs on consumer hardware. Buterin tested three configurations: a laptop with an NVIDIA 5090 GPU, an AMD Ryzen AI Max Pro laptop with 128 GB of unified memory, and NVIDIA's DGX Spark. He runs the open-source Qwen3.5:35B model via llama-server through llama-swap for model management. The 5090 gets 90 tokens per second on the 35B model. The AMD setup manages 51. Below 50 tokens per second, Buterin says it feels too slow to be useful.

He was not impressed with the DGX Spark. "It's described as an 'AI supercomputer on your desk' but in reality it has lower tokens/sec than a good laptop GPU," he wrote, calling it "lame." For people who can't afford high-end hardware, his suggestion is refreshingly practical: pool money with friends, buy one good machine, and connect to it remotely.

The honest limits

Here's where it gets interesting, because Buterin doesn't pretend this setup competes with frontier models. He admits the local Qwen3.5:35B performs well on routine coding tasks but falls apart on harder problems. When he tried to get it to implement BLS-12-381 hash-to-point in Vyper, he kept wrestling with errors before giving up and sending the problem to Claude, which completed it in one attempt. If you want an AI agent that can independently improve your codebase without supervision, he writes, local models and laptops aren't powerful enough.

That admission is what makes the post credible. Most advocacy for local-first AI either ignores the capability gap or waves it away. Buterin acknowledges it directly and then asks: what can we build within those constraints that still protects privacy?

Sandboxing everything

The answer involves aggressive isolation. Buterin runs all LLM and agent activity inside bubblewrap sandboxes, each rooted in a specific directory with access only to whitelisted files. He built a custom messaging daemon that lets the AI read his Signal messages and email but requires explicit human approval before sending anything outbound. He calls this the "human + LLM 2-of-2" model, a phrase that maps directly onto the multisig wallet logic the crypto world already uses.

"The new two-factor authentication is the human and the LLM," he wrote, and the comparison to his personal crypto setup isn't casual. Buterin keeps 90% of his funds in a multisig Safe wallet with keys distributed among trusted contacts.

When he does need to use a remote model (for tasks the local one can't handle), requests first pass through a local model that strips out sensitive information before anything hits the network. It is paranoid by design. Whether that paranoia is excessive depends on how seriously you take the OpenClaw research, and I'd argue the research speaks for itself.

Does this actually work as a daily setup?

Sort of. Buterin runs NixOS, which lets him specify his entire system configuration as a declarative config file. He uses the search engine SearXNG (aggregating multiple search engines) and built custom skills for his local agent. He switched from ollama to llama-server after, as he puts it, "half of Twitter told me I was a noob" for using it. He tested the theory. Ollama couldn't fit Qwen3.5:35B onto his GPU. Llama-server could.

For image generation he runs Qwen-Image and HunyuanVideo 1.5 through ComfyUI. Video generation takes about 15 minutes for five seconds of output on the 5090. On the AMD laptop, roughly five times longer, partly because ComfyUI doesn't have Vulkan support yet.

The post itself carries a warning at the top: don't copy this setup and assume it's secure. Buterin describes it as a starting point, not a finished product. Which is honest, and also a little unsatisfying. The gap between "starting point" and "something a normal person could use" remains enormous.

Self-custody for your brain

This post doesn't exist in isolation. In February, Buterin outlined a four-quadrant roadmap for how Ethereum should intersect with AI, spanning private AI use, agent coordination, governance, and verification. Earlier in January, he declared 2026 the year to "take back lost ground in computing self-sovereignty," swapping Google Maps for OpenStreetMap, Gmail for Proton Mail, and Google Docs for Fileverse.

The LLM post is the most technically detailed installment in that project. And the core argument is compelling even if the execution is rough: if AI agents are going to mediate your digital life, controlling those agents matters as much as controlling your money. The crypto world built its identity on self-custody and distrust of intermediaries. Buterin is trying to extend that same logic to AI before the window closes.

Whether anyone beyond a handful of technically sophisticated enthusiasts will actually do this is another question. But given that OpenClaw's creator, Peter Steinberger, joined OpenAI in February, and the platform's security advisor was literally someone who proved the marketplace was full of malware, maybe a little paranoia is warranted.