Qwen3.5 4B — Claude Opus Reasoning Distillation

A careful approach to distillation: Premium reasoning capabilities transferred in a single epoch with minimal capability loss.

Before you dismiss this as yet another community distillation with the usual quality tradeoffs — stop and read this.

This model takes a more careful approach to distillation. We've transferred Claude Opus 4.6's reasoning patterns and conversational style into Qwen3.5-4B while avoiding the catastrophic forgetting that plagues many community distillation attempts. The result: net improvements across most benchmarks with only minor tradeoffs.

🎯 Why This Model is Different

The Distillation Problem Everyone Ignores

Most community distillations follow a predictable pattern:

Collect synthetic data from a frontier model
Train for multiple epochs until loss looks good
Ship it and hope for the best

The result? Models that feel different but perform worse. They lose capabilities on benchmarks, develop repetition issues, forget how to follow instructions properly, perform noticeably worse on coding & math tasks, and exhibit the telltale signs of overfitting that make them unreliable for real-world use.

We took a completely different approach.

The Single-Epoch Revolution

Our methodology proves that quality dramatically outweighs quantity in distillation:

Aspect	Typical Community Distills	Our Approach
Epochs	2-4 epochs	1 epoch
Data Quality	Mass-generated synthetic	Hand-curated Opus reasoning traces
Capability Retention	Significant regressions	Mostly preserved with net gains
Overfitting	Common	None observed
Output Quality	Degraded task completion	Clean, purposeful generation

By training for exactly one epoch on curated data, we achieve style transfer while minimizing damage to the model's foundational capabilities. Most of the base model's knowledge remains intact while gaining reasoning patterns from Claude Opus.

🧠 What Makes the Training Data Special

Premium Reasoning from Claude Opus 4.6

This isn't data scraped from random API calls or generated with lazy prompting. Almost every training example comes from Claude Opus 4.6 — Anthropic's most capable reasoning model — executing complex, multi-step reasoning tasks. To strengthen the data corpus another ~800 examples were used from Claude Sonnet 4.6

The dataset includes:

Deep analytical reasoning with explicit thinking traces
Multi-turn conversations that maintain coherent context
Complex problem decomposition showing how to break down difficult problems
Self-correction patterns where the model catches and fixes its own mistakes

Mixed Tool + Non-Tool Corpus

Our training corpus intentionally includes:

~92% pure reasoning examples — analytical thinking, problem-solving, explanations
~8% tool-use examples — web search, data fetching, structured operations

This ratio mirrors realistic assistant usage patterns and ensures the model:

Doesn't over-index on tool calling when it's unnecessary
Knows when and how to invoke tools appropriately
Maintains strong reasoning even when tools are available but not needed
Keeps all code-related post-training intact

Tools included: web_search, web_fetch, grep

📊 Benchmark Results

Head-to-head against the base unsloth/Qwen3.5-4B:

Benchmark	Base	Fine-tuned	Δ	Result
ifeval	0.262	0.309	+17.6%	✅ Win
arc_challenge	0.346	0.392	+13.3%	✅ Win
winogrande	0.589	0.638	+8.3%	✅ Win
hellaswag	0.496	0.500	+0.9%	✅ Win
gpqa_diamond	0.283	0.283	0%	➖ Tie
truthfulqa_mc2	0.545	0.530	-2.7%	❌ Loss
mmlu	0.256	0.232	-9.6%	❌ Loss

Summary: 4 wins, 2 losses, 1 tie.

What This Means

Reasoning & instruction following improved — IFEval (+17.6%), ARC (+13.3%), and Winogrande (+8.3%) gains show better logical reasoning and instruction adherence
Knowledge tradeoff on MMLU — The -9.6% MMLU drop suggests some factual recall displacement (common in style transfers)
TruthfulQA mostly preserved — Only -2.7% loss, indicating the model didn't pick up hallucination tendencies

Qualitative Improvements

Reduced token generation — More concise outputs without verbose padding
Fixed thinking loops — Base model's tendency to get stuck in reasoning cycles is reduced
Deeper reasoning traces — <think> blocks show more structured analytical depth
Better conversational flow — Responses feel more natural and contextually aware

🔬 Technical Details

Key Methodological Choices

Response-only training — Loss computed only on assistant outputs, not user inputs
Preserved reasoning traces — <think> blocks kept intact for reasoning-style transfer
Strict data validation — Malformed traces, duplicates, and broken tool calls removed
Consistent formatting — Unified chat template across all sources

📦 Dataset Composition

Source	Examples	Type
TeichAI/Claude-Opus-4.6-Reasoning-887x	887	Mixed
TeichAI/Claude-Sonnet-4.6-Reasoning-799x	799	Pure reasoning
TeichAI/claude-4.5-opus-high-reasoning-250x	250	High complexity
Crownelius/Opus-4.6-Reasoning-2100x-formatted	2100	Pure reasoning
Total	~4000	Mixed tool/non-tool

💡 Lessons Learned

What Worked

Single epoch training — Avoided the overfitting that causes catastrophic forgetting in multi-epoch runs
Quality over quantity — ~4000 curated examples outperformed what we'd expect from larger noisy datasets
Mixed tool/non-tool data — Kept the model grounded in both reasoning and tool-use contexts
Response-only loss — Training only on assistant outputs preserved instruction-following

Tradeoffs to Consider

Small MMLU/TruthfulQA regressions suggest some factual knowledge displacement
Style transfer always has costs — this approach minimizes but doesn't eliminate them
Your mileage may vary depending on use case

🙏 Acknowledgments

This model was trained 2x faster with Unsloth and Hugging Face's TRL library.

📜 License

Apache 2.0 — Use freely, build boldly.

Downloads last month: 241

GGUF

Model size

4B params

Architecture

qwen35

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wop/Opus4Qwen4

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

unsloth/Qwen3.5-4B

Finetuned

TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning

Quantized

(4)

this model

Duplicated from TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-GGUF

wop
/

Opus4Qwen4

Qwen3.5 4B — Claude Opus Reasoning Distillation

🎯 Why This Model is Different

The Distillation Problem Everyone Ignores

The Single-Epoch Revolution

🧠 What Makes the Training Data Special

Premium Reasoning from Claude Opus 4.6

Mixed Tool + Non-Tool Corpus

📊 Benchmark Results

What This Means

Qualitative Improvements

🔬 Technical Details

Key Methodological Choices

📦 Dataset Composition

💡 Lessons Learned

What Worked

Tradeoffs to Consider

🙏 Acknowledgments

📜 License

Model tree for wop/Opus4Qwen4

Datasets used to train wop/Opus4Qwen4