A developer using the handle hesamation released a fine-tune of Alibaba's Qwen3.6-35B-A3B on Hugging Face that distills reasoning traces from Claude Opus 4.6. The training is LoRA-based supervised fine-tuning on roughly 14,000 chain-of-thought conversations, most pulled from Claude Opus outputs.
The author reports a jump from 42.86% to 75.71% on MMLU-Pro, a gain of nearly 33 points. The catch: that eval ran on just 70 questions, five per subject across 14 subjects. The model card flags it as "a smoke/comparative check, not a release-quality full benchmark." No third-party evaluations are posted yet.
The recipe borrows from Jackrong's earlier Qwen3.5 distill. Bulk of the data comes from a public dataset of Claude Opus reasoning samples, plus smaller sets from two other community sources.
Qwen3.6-35B-A3B is a mixture-of-experts model with 3B active parameters, small enough to run locally on a single high-end GPU with quantization. The fine-tune is text-only; the base model's vision encoder is untouched. The author is asking for community benchmarks.
Bottom Line
The fine-tune reports a 33-point MMLU-Pro gain on a 70-question test, not a full benchmark run.
Quick Facts
- Base model: Qwen3.6-35B-A3B (35B total, 3B active, MoE)
- Training: LoRA SFT on attention modules, 2 epochs, 762 steps
- Training data: ~14,233 chain-of-thought samples from three community datasets
- MMLU-Pro: 42.86% base vs 75.71% fine-tune on 70 questions (self-reported)
- License: Apache 2.0




