Google and Microsoft Want AI Agents to Stop Guessing What Your Website Does

Google's Chrome team quietly shipped an early preview of WebMCP, a proposed web standard that lets websites tell AI agents exactly what they can do, and how to do it, through a new browser API called navigator.modelContext. Microsoft co-authored the spec. The W3C's Web Machine Learning community group is incubating it. And it's already running in Chrome 146 Canary behind a feature flag.

The pitch is simple: instead of agents burning tokens on screenshots and DOM parsing to figure out where a search bar is, a site registers callable tools (think searchFlights(origin, destination, date)) that the agent can invoke directly. Two APIs, one for HTML forms and one for JavaScript, handle the plumbing.

Whether anyone beyond Google and Microsoft actually adopts it is a different question.

What agents do now (and why it's awful)

Right now, browser-based AI agents navigate websites the way a confused tourist navigates a foreign city. Screenshot-based agents pass images to multimodal models and try to identify clickable elements from pixels. DOM-based agents ingest raw HTML and hope the markup makes sense. Both approaches are slow, expensive, and fragile. A single product search can require dozens of sequential interactions, each one an inference call.

The WebMCP specification tackles this by giving websites a way to publish a "Tool Contract." The site declares what actions are available, with structured input schemas, and agents call those functions rather than fumbling around the page. André Cipriani Bandarra, a staff developer relations engineer at Google, framed it as eliminating ambiguity: agents get told how and where to interact with a site, instead of guessing.

That's the theory. Early benchmarks from the Chrome team claim a 67% reduction in computational overhead and around 98% task accuracy compared to traditional agent-browser interactions. But those are Google's own numbers from an early preview, and I haven't seen independent validation.

The dual-API architecture

WebMCP splits the problem into two pieces. The Declarative API handles standard form-based actions. If your HTML forms are already clean and well-structured, adding tool names and descriptions to existing markup gets you most of the way there. Bandarra suggested sites with good forms are "80% of the way" to agent-readiness, which feels optimistic but not crazy.

The Imperative API is where things get more interesting. It handles complex, dynamic interactions through JavaScript, letting developers register richer tool schemas through registerTool(). Conceptually it's similar to tool definitions you'd send to the OpenAI or Anthropic API endpoints, but running entirely client-side in the browser tab.

navigator.modelContext.registerTool({
  name: 'searchFlights',
  description: 'Search available flights by route and date',
  inputSchema: {
    type: 'object',
    properties: {
      origin: { type: 'string' },
      destination: { type: 'string' },
      date: { type: 'string' }
    },
    required: ['origin', 'destination', 'date']
  },
  execute: async (params) => {
    const results = await internalFlightSearch(params);
    return { flights: results };
  }
});

No backend required. The browser is the server.

The security problem nobody has solved

Here's where I start getting uncomfortable. Google's position on prompt injection, the technique where attackers embed hidden instructions in content to manipulate AI agents, is that defending against it is the responsibility of individual agents, not the API itself.

Good luck with that.

The spec itself acknowledges what it calls the "lethal trifecta": an agent with access to private data (say, your email), exposed to untrusted content (a phishing message), and equipped with external communication tools (the ability to forward data). Each step is individually legitimate. Together, they form an exfiltration chain. And prompt injection makes the whole thing worse.

OpenAI has acknowledged that prompt injection may never be fully solved. Anthropic's own research on their Chrome browser agent shows a 1% attack success rate after significant hardening, which they explicitly call "meaningful risk." Meanwhile, WebMCP is asking websites to expose callable tools that run with the user's authenticated session. A deceptive tool could describe itself as "add to cart" while actually completing a purchase.

The security documentation for the reference implementation lists mitigations: origin isolation, user consent flows for sensitive tools, hashing tool definitions. They reduce risk. They don't eliminate it. The spec recommends fewer than 50 tools per page, which is practical but also suggests this is designed for focused use cases, not exposing your entire application surface.

Not MCP. Not NLWeb. Something else.

The naming is confusing, and probably deliberately so. WebMCP shares conceptual DNA with Anthropic's Model Context Protocol but doesn't follow MCP's JSON-RPC specification for client-server communication. Where MCP operates as a backend protocol connecting AI platforms to service providers through hosted servers, WebMCP runs entirely client-side in the browser.

The relationship is complementary, at least in theory. A travel company might maintain a backend MCP server for direct API integrations with ChatGPT or Claude, while also implementing WebMCP tools on its consumer-facing website for browser-based agents. But there's an unavoidable land-grab dynamic here. Google is positioning Chrome as the chokepoint between AI agents and the web.

Microsoft's NLWeb, announced at Build 2025, takes a different approach. It focuses on giving websites natural language query interfaces, with each instance acting as an MCP server. The fact that Microsoft co-authored WebMCP while also pushing NLWeb suggests the company is hedging, which is fair. Nobody knows which of these protocols will stick.

And neither Apple nor Mozilla have signaled anything about WebMCP support. Chrome and Edge are it, for now.

Who actually benefits?

The announced use cases (travel bookings, support tickets, e-commerce) make sense. But there's an uncomfortable subtext that The Decoder flagged: if AI agents handle product searches, price comparisons, and bookings autonomously, fewer people need to visit the actual website. Sites risk losing ad revenue, direct customer relationships, and the chance to win users over with their own experience. The web becomes background infrastructure.

Dan Petrovic, an SEO consultant, called WebMCP the biggest shift in technical SEO since structured data. That comparison is instructive, because structured data also turned out to mostly benefit Google.

Chrome 146 stable is expected around March 2026. The specification is transitioning from W3C community incubation to a formal draft. Google hasn't confirmed whether WebMCP will become a search ranking signal, but the pattern is familiar: mobile-friendly became a ranking factor, HTTPS became a ranking factor, Core Web Vitals became a ranking factor. The question is when, not if.

For now, WebMCP is available behind a flag in Chrome Canary. Developers can join the early preview program for documentation and demos. There's already a polyfill implementation with React hooks and vanilla JS support.

Whether this becomes the USB-C of agent-web interactions or another well-intentioned spec that never escapes developer preview depends entirely on adoption. Google and Microsoft shipping code together is a strong signal. But the web has a graveyard full of strong signals.