Salesforce Executives Say Trust in Large Language Models Has Declined

Salesforce's top product marketing executive admitted this month that confidence in large language models has eroded significantly over the past year. "We all had more confidence in LLMs a year ago," said Sanjna Parulekar, SVP of product marketing, during a recent roundtable discussion on the company's AI strategy.

The timing is awkward. This candid assessment comes months after CEO Marc Benioff boasted on The Logan Bartlett Show podcast that Salesforce had cut its customer support staff from 9,000 to 5,000 employees, largely through AI deployment. "I need less heads," Benioff said in that August interview.

When the model forgets to send the survey

The problems are technical, not philosophical. Salesforce's CTO Muralidhar Krishnaprasad has noted that LLMs start ignoring instructions when given more than eight directives, which is, to put it mildly, not great for enterprise workflows that routinely require dozens of sequential steps.

Then there's what Phil Mui, SVP of Agentforce Software Engineering, calls "drift." In an October blog post, Mui described how AI agents lose focus on their primary objectives when users ask tangential questions. A chatbot designed to help customers fill out a form gets distracted when someone asks about shipping times. The form never gets completed.

Home security company Vivint, which uses Agentforce to manage customer service for 2.5 million customers, ran into these reliability issues firsthand. According to reporting from The Information, the AI would sometimes skip sending customer satisfaction surveys after interactions, despite explicit instructions. Vivint worked with Salesforce to implement "deterministic triggers" to guarantee the surveys actually went out.

What "deterministic" actually means

Salesforce's fix is essentially a retreat. The company is now emphasizing "hybrid reasoning," which combines LLM flexibility for conversational tasks with old-fashioned rule-based logic (what Salesforce calls Flows, Apex, and APIs) for anything business-critical.

Their new Agent Graph and Agent Script tools let developers specify exactly which parts of a workflow should be LLM-driven versus hard-coded. It's sophisticated product thinking, but it's also an admission that the original vision of autonomous AI agents handling complex workflows end-to-end was oversold.

A Salesforce spokesperson denied the company is backtracking, stating they're "simply being more intentional about their use" of LLMs. The Agentforce platform is reportedly on track for over $500 million in annual recurring revenue, up 330% year-over-year, with over 9,500 paid deals closed. Those numbers are real, though they don't tell you how much of that value comes from the AI parts versus the traditional automation Salesforce has been selling for years.

The stock tells a story

Salesforce shares hit an all-time high of $365 on December 4, 2024, riding the Agentforce hype. As of late December, the stock trades around $265, a decline of roughly 27% from that peak. Investors seem uncertain whether the agentic AI narrative can deliver.

The company still employs about 76,000 people worldwide and remains the dominant player in CRM. But the LLM reliability admission creates a credibility problem. Benioff spent much of 2024 insisting AI wouldn't cause mass layoffs. "I keep looking around, talking to CEOs, asking, what AI are they using for these big layoffs?" he told Fortune earlier in the year. Then Salesforce cut 4,000 support roles. Now executives admit the AI they deployed has reliability issues significant enough to warrant an architectural overhaul.

Enterprise buyers paying attention will ask the obvious question: did you cut people based on capabilities you're now admitting were oversold?

Salesforce's Agentforce 3.0 is expected to include new observability tools, including a Command Center for monitoring agent performance. General availability is planned for later in 2026.

Salesforce Executives Say Trust in Large Language Models Has Declined

When the model forgets to send the survey

What "deterministic" actually means

The stock tells a story

Liza Chan

Related Articles

Anthropic Launches 10 Claude Cowork Plugins Targeting Finance, HR, and Engineering

Notion Ships Custom Agents for Autonomous Team Workflows

Claude Cowork Adds Scheduled Tasks and Customize Tab

Stay Ahead of the AI Curve