Aqui-open0-2: SOTA 21B Open Weights Reasoning Model
Aqui-open0-2 is a state-of-the-art 21 billion parameter open weights reasoning model from Aqui Solutions, creators of AquiGPT. Built on Qwen3 14B and extended with additional layers, this model delivers exceptional coding and reasoning performance that rivals much larger models while remaining accessible to the open-source community.
Key Features
- Extended Architecture: 21B parameters with layers added to Qwen3 14B base
- SOTA Performance: Competitive with larger proprietary and open models
- 8-bit Precision: Optimized for efficiency without sacrificing quality
- 40K Context Window: Expandable to 128K using YARN scaling
- Strong Reasoning: Approaches performance of closed Aqui-v2-0 models
- Open Weights: Fully open under Apache 2.0 license
Performance Benchmarks
Aqui-open0-2 demonstrates exceptional performance across multiple challenging benchmarks:
Benchmark | Aqui-open0-2 (21B) | gpt-oss (21.5B) | Qwen3 (30.5B) | Solar Pro 2 (30.9B) | EXAONE 4.0 (32B) | GLM-4.5 Air (110B) | Aqui-v2-0 tiny |
---|---|---|---|---|---|---|---|
MMLU-Pro | 79.8% | 73.6% | 77.7% | 80.5% | 81.8% | 81.5% | 75.4% |
GPQA Diamond | 66.1% | 61.7% | 61.6% | 68.7% | 73.9% | 73.3% | 64.3% |
Humanity's Last Exam | 10.6% | 8.5% | 9.8% | 7.0% | 10.5% | 6.8% | 5.6% |
LiveCodeBench | 69.1% | 72.1% | 66.0% | 61.6% | 74.7% | 68.4% | 51.9% |
AIME 2025 | 71.9% | 61.7% | 72.3% | 61.3% | 80.0% | 63.0% | 75.0% |
IFBench | 50.4% | 60.5% | 41.5% | 37.1% | 36.3% | 44.0% | 39.2% |
AA-Index | 51.8% | 49.0% | 42.3% | 43.3% | 50.7% | 49.5% | 46.8% |
Bold: Best performance, Italics: Second best
Model Specifications
- Parameters: 21 billion
- Base Model: Qwen3 14B with extended layers
- Context Window: 40,000 tokens (expandable to 128K with YARN)
- Precision: 8-bit optimized
- Architecture: Extended Qwen transformer
- Languages: 23+ languages with strong multilingual support
- Knowledge Cutoff: October 2024
Hardware Requirements
Minimum Requirements
- GPU: RTX 3090 (24GB VRAM) or RTX 4090
- Mac: 32GB unified memory (Apple Silicon)
- RAM: 32GB system memory
- Storage: 25GB available space
Recommended Setup
- GPU: RTX 4090 or A100 (40GB)
- CPU: Modern multi-core processor
- RAM: 64GB+ for optimal performance
- Storage: NVMe SSD for faster loading
Installation & Usage
Quick Start with Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "aquigpt/open0-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Generate response
prompt = "Write a Python function to implement binary search with detailed comments."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=1024,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using with vLLM
from vllm import LLM, SamplingParams
# Initialize model
llm = LLM(
model="aquigpt/open0-2",
tensor_parallel_size=1,
trust_remote_code=True
)
# Set sampling parameters
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=512
)
# Generate
prompts = ["Explain quantum computing in simple terms."]
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)
Context Extension with YARN
# Enable YARN scaling for longer contexts
model = AutoModelForCausalLM.from_pretrained(
"aquigpt/open0-2",
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
rope_scaling={
"type": "yarn",
"factor": 3.2, # Extends to ~128K tokens
}
)
Use Cases
Advanced Reasoning & Mathematics
- Complex mathematical problem solving (AIME 2025: 71.9%)
- Scientific reasoning and analysis
- Multi-step logical reasoning
- Academic research assistance
Code Generation & Programming
- Algorithm implementation and optimization
- Code review and debugging
- Technical documentation
- Live coding challenges (LiveCodeBench: 69.1%)
Professional Applications
- Research and analysis
- Technical writing
- Multilingual communication
- Educational tutoring with detailed explanations
Quantization Options
Available quantization formats for different hardware setups:
- BF16: ~42GB VRAM (full precision)
- FP16: ~42GB VRAM (recommended)
- INT8: ~21GB VRAM (efficient)
- INT4: ~11GB VRAM (consumer hardware)
Fine-tuning Support
Aqui-open0-2 supports various fine-tuning approaches:
- LoRA/QLoRA: Parameter-efficient fine-tuning
- Full Fine-tuning: Complete model adaptation
- Custom Tokenizer: Domain-specific vocabulary
- Multi-task Learning: Specialized task combinations
Comparison with Closed Models
Aqui-open0-2 approaches the performance of our proprietary models:
- Aqui-v2-0 tiny: Matches or exceeds on most benchmarks
- Aqui-v2-0: Competitive performance at fraction of the size
- Cost Efficiency: Open weights eliminate API costs
- Customization: Full model access for specialized needs
Limitations
- Knowledge cutoff at October 2024
- May occasionally produce hallucinations
- Requires significant computational resources for optimal performance
- 8-bit precision may impact some edge cases
- Context extension reduces efficiency
License
This model is released under the Apache 2.0 License, enabling both research and commercial applications without restrictions.
Ethical Considerations
Aqui-open0-2 is designed for beneficial applications. Users should:
- Implement appropriate safety measures for production use
- Consider bias mitigation in sensitive applications
- Follow responsible AI practices
- Respect applicable laws and regulations
Support & Community
- Repository: Hugging Face Model Page
- Discussions: Join community discussions on Hugging Face
Acknowledgments
- Qwen Team: built the base model, Qwen3 14B;
- DeepSeek Team: synthetic dataset used for training the model was made with R1;
- HuggingFace: hosting the model weights.
Copyright 2025 Aqui Solutions. All rights reserved.
- Downloads last month
- 20