Pentagon Bets on Grok for Military AI, Months After the Chatbot Went Full Nazi

The Department of War announced Monday it will integrate xAI's Grok AI models into GenAI.mil, the military's recently launched internal AI platform. The deployment, targeted for early 2026, will give all 3 million military and civilian personnel access to xAI's capabilities at Impact Level 5, enabling the secure handling of Controlled Unclassified Information.

What the announcement doesn't mention: five months ago, Grok spent a day praising Adolf Hitler, pushing antisemitic conspiracy theories, and generating graphic violent content after xAI tweaked its system prompts.

The platform nobody asked questions about

GenAI.mil launched on December 9 with Google's Gemini for Government as its first offering. Defense Secretary Pete Hegseth introduced it with characteristic subtlety: "The future of American warfare is here, and it's spelled AI." The Pentagon's Chief Technology Officer Emil Michael, who also serves as undersecretary for research and engineering, called AI "America's next Manifest Destiny."

The xAI deal extends this vision. According to the official announcement, Grok's capabilities will be "embedded directly into GenAI.mil," and users will also gain access to "real‑time global insights from the X platform." That last part is worth pausing on: the Pentagon is explicitly touting access to X's data stream as a feature, not a bug.

The partnership could lead to potential future classified workloads, though the current deal covers only unclassified material. The War Department frames this as providing personnel with "a decisive information advantage."

What happened in July

In early July, Musk announced that xAI had "improved @Grok significantly." The improvement in question: xAI had added instructions for the bot to "not shy away from making claims which are politically incorrect, as long as they are well substantiated."

By Tuesday of that week, Grok was praising Hitler. When a user asked the chatbot about a woman in an image, Grok responded with antisemitic tropes, saying "'the type' in that meme often points to surnames like Goldstein, Rosenberg, Silverman, Cohen, or Shapiro—frequently popping up among vocal radicals." The bot described itself as "MechaHitler" in some responses and generated violent rape narratives when prompted by users.

Poland announced it would report xAI to the European Commission. Turkey blocked some access to Grok. The Anti-Defamation League called the responses "irresponsible, dangerous, and antisemitic."

xAI issued a lengthy apology that Saturday, blaming the incident on a code update that caused Grok to refer to "existing X user posts; including when such posts contained extremist views." The company said the problematic instructions were active for 16 hours.

This wasn't Grok's first incident. In May, the chatbot engaged in Holocaust denial and repeatedly brought up false claims of "white genocide" in South Africa. xAI blamed that one on "an unauthorized modification" by a "rogue employee."

The money trail

The Pentagon's embrace of xAI follows a broader push to bring frontier AI companies into defense work. In July, the Chief Digital & AI Office awarded contracts of up to $200 million each to four companies: xAI, OpenAI, Google, and Anthropic. That's $800 million in potential ceiling value spread across the leading commercial AI developers.

The xAI contract was announced the same week as the antisemitic incident, though the precise timing of the award decision remains unclear. The Pentagon did not publicly address whether Grok's behavior factored into its evaluation.

These awards aren't enormous by Pentagon standards, and they're tiny compared to what these companies raise privately. OpenAI alone reported $10 billion in annualized revenue last month and raised $40 billion from investors in March. But the contracts signal that the Defense Department sees commercial AI as central to its future, regardless of how stable these systems prove to be.

Musk's complicated federal position

The deal adds another layer to Elon Musk's tangled relationship with the federal government. Musk left his role leading the Department of Government Efficiency in May after legal setbacks and clashes with Trump's cabinet. DOGE has since been effectively dissolved, with OPM Director Scott Kupor telling Reuters in November that DOGE "doesn't exist" as a centralized entity anymore.

In a recent interview, Musk called his DOGE work "a little bit successful" and said he wouldn't do it again if given the chance. The effort claimed $214 billion in savings, though a Politico analysis found that DOGE used "faulty math" to inflate its numbers.

Despite that rocky departure, Musk's companies continue winning federal contracts. SpaceX remains a critical NASA and Defense Department contractor. And now xAI will power part of the military's AI infrastructure.

What the Pentagon is actually getting

The deployed version of Grok will use "sovereign AI instances that are entirely air-gapped from public data streams." In theory, this means the military's Grok won't be pulling from X's content firehose in the same way the consumer version does.

But the Pentagon announcement explicitly touts X integration as a benefit. The department said the partnership would give personnel "real-time global insights from the X platform," providing "faster situational awareness around the globe."

It's not clear how these two claims square. Either the military version is isolated from X's content, or it provides real-time X insights. Perhaps different deployment tiers offer different access levels. The Pentagon's announcement doesn't elaborate.

Grok joins Google's Gemini on the platform. GenAI.mil launched earlier this month with Gemini as its first capability, giving personnel access to what the Pentagon calls "deep research, format documents, and even analyze video or imagery." Whether adding a second model architecture improves outcomes or just adds complexity remains to be seen.

The safety question nobody's answering

The core issue isn't whether Grok can be useful for military applications. It clearly can. Large language models excel at document processing, translation, summarization, and analysis tasks that consume enormous amounts of staff time.

The issue is what happens when a system with minimal guardrails starts processing sensitive information at scale. "These systems are trained on the grossest parts of the internet," Maarten Sap, an assistant professor at Carnegie Mellon and head of AI Safety at the Allen Institute for AI, told CNN after the July incident.

CNN's own testing of Grok 4 found that asking the bot to adopt a "White nationalist tone" produced antisemitic responses, including "The Jews ain't your friends – they're the architects of your downfall." The same prompts on Google's Gemini generated refusals.

The Pentagon's announcement doesn't address how xAI's government-specific version will handle adversarial prompting, what safety testing it underwent, or who evaluated its performance. The release simply states that the initiative "marks another milestone in America's AI revolution."

The FTC has until now shown little interest in AI safety questions. The Pentagon apparently hasn't either.