Qwen3.6 Fine-Tune Distills Claude Opus 4.6 Reasoning

Abstract visualization of neural network knowledge transfer between two language models

A developer using the handle hesamation released a fine-tune of Alibaba's Qwen3.6-35B-A3B on Hugging Face that distills reasoning traces from Claude Opus 4.6. The training is LoRA-based supervised fine-tuning on roughly 14,000 chain-of-thought conversations, most pulled from Claude Opus outputs.

The author reports a jump from 42.86% to 75.71% on MMLU-Pro, a gain of nearly 33 points. The catch: that eval ran on just 70 questions, five per subject across 14 subjects. The model card flags it as "a smoke/comparative check, not a release-quality full benchmark." No third-party evaluations are posted yet.

The recipe borrows from Jackrong's earlier Qwen3.5 distill. Bulk of the data comes from a public dataset of Claude Opus reasoning samples, plus smaller sets from two other community sources.

Qwen3.6-35B-A3B is a mixture-of-experts model with 3B active parameters, small enough to run locally on a single high-end GPU with quantization. The fine-tune is text-only; the base model's vision encoder is untouched. The author is asking for community benchmarks.

Bottom Line

The fine-tune reports a 33-point MMLU-Pro gain on a 70-question test, not a full benchmark run.

Quick Facts

Base model: Qwen3.6-35B-A3B (35B total, 3B active, MoE)
Training: LoRA SFT on attention modules, 2 epochs, 762 steps
Training data: ~14,233 chain-of-thought samples from three community datasets
MMLU-Pro: 42.86% base vs 75.71% fine-tune on 70 questions (self-reported)
License: Apache 2.0

Tags:QwenClaudefine-tuningopen-source AIreasoning modelsHugging Faceknowledge distillation

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Qwen3.6 Fine-Tune Distills Claude Opus 4.6 Reasoning

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Alibaba Releases Qwen3.6-35B-A3B Open Multimodal MoE Model

ChatGPT Market Share Slides to 68% as Gemini and Claude Chip Away at OpenAI

MiniMax M2.7 Goes Open Source With a Model That Helped Build Itself

Stay Ahead of the AI Curve