Qwen3.5 4B — Claude Opus Reasoning Distillation
A careful approach to distillation: Premium reasoning capabilities transferred in a single epoch with minimal capability loss.
Before you dismiss this as yet another community distillation with the usual quality tradeoffs — stop and read this.
This model takes a more careful approach to distillation. We've transferred Claude Opus 4.6's reasoning patterns and conversational style into Qwen3.5-4B while avoiding the catastrophic forgetting that plagues many community distillation attempts. The result: net improvements across most benchmarks with only minor tradeoffs.
🎯 Why This Model is Different
The Distillation Problem Everyone Ignores
Most community distillations follow a predictable pattern:
- Collect synthetic data from a frontier model
- Train for multiple epochs until loss looks good
- Ship it and hope for the best
The result? Models that feel different but perform worse. They lose capabilities on benchmarks, develop repetition issues, forget how to follow instructions properly, perform noticeably worse on coding & math tasks, and exhibit the telltale signs of overfitting that make them unreliable for real-world use.
We took a completely different approach.
The Single-Epoch Revolution
Our methodology proves that quality dramatically outweighs quantity in distillation:
| Aspect | Typical Community Distills | Our Approach |
|---|---|---|
| Epochs | 2-4 epochs | 1 epoch |
| Data Quality | Mass-generated synthetic | Hand-curated Opus reasoning traces |
| Capability Retention | Significant regressions | Mostly preserved with net gains |
| Overfitting | Common | None observed |
| Output Quality | Degraded task completion | Clean, purposeful generation |
By training for exactly one epoch on curated data, we achieve style transfer while minimizing damage to the model's foundational capabilities. Most of the base model's knowledge remains intact while gaining reasoning patterns from Claude Opus.
🧠 What Makes the Training Data Special
Premium Reasoning from Claude Opus 4.6
This isn't data scraped from random API calls or generated with lazy prompting. Almost every training example comes from Claude Opus 4.6 — Anthropic's most capable reasoning model — executing complex, multi-step reasoning tasks. To strengthen the data corpus another ~800 examples were used from Claude Sonnet 4.6
The dataset includes:
- Deep analytical reasoning with explicit thinking traces
- Multi-turn conversations that maintain coherent context
- Complex problem decomposition showing how to break down difficult problems
- Self-correction patterns where the model catches and fixes its own mistakes
Mixed Tool + Non-Tool Corpus
Our training corpus intentionally includes:
- ~92% pure reasoning examples — analytical thinking, problem-solving, explanations
- ~8% tool-use examples — web search, data fetching, structured operations
This ratio mirrors realistic assistant usage patterns and ensures the model:
- Doesn't over-index on tool calling when it's unnecessary
- Knows when and how to invoke tools appropriately
- Maintains strong reasoning even when tools are available but not needed
- Keeps all code-related post-training intact
Tools included: web_search, web_fetch, grep
📊 Benchmark Results
Head-to-head against the base unsloth/Qwen3.5-4B:
| Benchmark | Base | Fine-tuned | Δ | Result |
|---|---|---|---|---|
| ifeval | 0.262 | 0.309 | +17.6% | ✅ Win |
| arc_challenge | 0.346 | 0.392 | +13.3% | ✅ Win |
| winogrande | 0.589 | 0.638 | +8.3% | ✅ Win |
| hellaswag | 0.496 | 0.500 | +0.9% | ✅ Win |
| gpqa_diamond | 0.283 | 0.283 | 0% | ➖ Tie |
| truthfulqa_mc2 | 0.545 | 0.530 | -2.7% | ❌ Loss |
| mmlu | 0.256 | 0.232 | -9.6% | ❌ Loss |
Summary: 4 wins, 2 losses, 1 tie.
What This Means
- Reasoning & instruction following improved — IFEval (+17.6%), ARC (+13.3%), and Winogrande (+8.3%) gains show better logical reasoning and instruction adherence
- Knowledge tradeoff on MMLU — The -9.6% MMLU drop suggests some factual recall displacement (common in style transfers)
- TruthfulQA mostly preserved — Only -2.7% loss, indicating the model didn't pick up hallucination tendencies
Qualitative Improvements
- Reduced token generation — More concise outputs without verbose padding
- Fixed thinking loops — Base model's tendency to get stuck in reasoning cycles is reduced
- Deeper reasoning traces —
<think>blocks show more structured analytical depth - Better conversational flow — Responses feel more natural and contextually aware
🔬 Technical Details
Key Methodological Choices
- Response-only training — Loss computed only on assistant outputs, not user inputs
- Preserved reasoning traces —
<think>blocks kept intact for reasoning-style transfer - Strict data validation — Malformed traces, duplicates, and broken tool calls removed
- Consistent formatting — Unified chat template across all sources
📦 Dataset Composition
| Source | Examples | Type |
|---|---|---|
| TeichAI/Claude-Opus-4.6-Reasoning-887x | 887 | Mixed |
| TeichAI/Claude-Sonnet-4.6-Reasoning-799x | 799 | Pure reasoning |
| TeichAI/claude-4.5-opus-high-reasoning-250x | 250 | High complexity |
| Crownelius/Opus-4.6-Reasoning-2100x-formatted | 2100 | Pure reasoning |
| Total | ~4000 | Mixed tool/non-tool |
💡 Lessons Learned
What Worked
- Single epoch training — Avoided the overfitting that causes catastrophic forgetting in multi-epoch runs
- Quality over quantity — ~4000 curated examples outperformed what we'd expect from larger noisy datasets
- Mixed tool/non-tool data — Kept the model grounded in both reasoning and tool-use contexts
- Response-only loss — Training only on assistant outputs preserved instruction-following
Tradeoffs to Consider
- Small MMLU/TruthfulQA regressions suggest some factual knowledge displacement
- Style transfer always has costs — this approach minimizes but doesn't eliminate them
- Your mileage may vary depending on use case
🙏 Acknowledgments
This model was trained 2x faster with Unsloth and Hugging Face's TRL library.
📜 License
Apache 2.0 — Use freely, build boldly.
- Downloads last month
- 241
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for wop/Opus4Qwen4
Base model
Qwen/Qwen3.5-4B-Base

