Microsoft SkillOpt Trains Agent Skills, No Fine-Tuning

Abstract visualization of a text document being refined through iterative editing while a neural network stays static

Microsoft researchers have proposed a way to improve AI agents that skips fine-tuning entirely. The method, called SkillOpt, treats the skill file an agent reads as the thing you train, leaving the underlying model frozen. The technical paper went up on arXiv on May 22.

The skill is the parameter

Right now most agent skills get written by hand, generated once by an LLM, or patched after a few failed runs. That last habit is the dangerous one. A revision can read beautifully and still make the agent worse, because nothing checks whether the prose actually helped.

SkillOpt borrows the discipline of weight-space training and points it at text. A separate optimizer model reads scored rollouts, then proposes bounded add, delete, or replace edits to one skill document. The catch: an edit only sticks if it strictly improves a held-out validation score. Everything else gets rejected.

There is a textual learning-rate budget, a buffer for rejected edits, and a slow epoch-wise update meant to keep the whole thing from thrashing. The authors describe SkillOpt as the first systematic controllable text-space optimizer for agent skills, a careful enough hedge that it does not quite count as a first-ever boast.

So what do the numbers say?

They tested across six benchmarks, seven target models, and three harnesses: direct chat, Codex, and Claude Code. The headline claim is that SkillOpt was best or tied on every one of the 52 evaluated cells, beating human-written skills along with one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill.

On GPT-5.5 the gains over a no-skill baseline land at 23.5 points in direct chat, 24.8 inside Codex, and 19.1 inside Claude Code. Worth a pause here. "Best or tied on all 52" is the kind of clean sweep that usually means the competitors were run under the authors' own conditions, which is standard practice but also exactly why a clean sweep should be read with one eyebrow raised. The paper is a v1 preprint with no peer review yet.

Why it might actually travel

The part that interests me less for the benchmark bragging and more for the practical payoff: the output is a readable file, not a checkpoint. No optimizer calls at deployment, which means zero added inference cost once training stops.

The team also ran transfer experiments. They claim an optimized skill keeps its value when moved across model scales, swapped between Codex and Claude Code, and even dropped onto a nearby math benchmark with no further tuning. If that holds up outside the lab, it is the difference between a one-off prompt hack and something you can hand to a different agent loop and trust.

The paper runs 27 pages with four figures and six tables. No public code repo is listed on the arXiv page yet, so reproduction waits on whatever the authors release next.

Tags:SkillOptMicrosoft ResearchAI agentsagent skillsLLM optimizationClaude CodeCodexarXivprompt optimization

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Microsoft SkillOpt Trains Agent Skills Without Touching the Model

The skill is the parameter

So what do the numbers say?

Why it might actually travel

Oliver Senti

Related Articles

Anthropic and OpenAI Shift Enterprise Coding Tools to API Pricing

OpenAI and Thrive Build Self-Improving Tax AI for Accountants

Google Rebuilt Colab Around an AI Agent

Stay Ahead of the AI Curve