Aqui-open0-2: SOTA 21B Open Weights Reasoning Model

Aqui-open0-2 is a state-of-the-art 21 billion parameter open weights reasoning model from Aqui Solutions, creators of AquiGPT. Built on Qwen3 14B and extended with additional layers, this model delivers exceptional coding and reasoning performance that rivals much larger models while remaining accessible to the open-source community.

Key Features

Extended Architecture: 21B parameters with layers added to Qwen3 14B base
SOTA Performance: Competitive with larger proprietary and open models
8-bit Precision: Optimized for efficiency without sacrificing quality
40K Context Window: Expandable to 128K using YARN scaling
Strong Reasoning: Approaches performance of closed Aqui-v2-0 models
Open Weights: Fully open under Apache 2.0 license

Performance Benchmarks

Aqui-open0-2 demonstrates exceptional performance across multiple challenging benchmarks:

Benchmark	Aqui-open0-2 (21B)	gpt-oss (21.5B)	Qwen3 (30.5B)	Solar Pro 2 (30.9B)	EXAONE 4.0 (32B)	GLM-4.5 Air (110B)	Aqui-v2-0 tiny
MMLU-Pro	79.8%	73.6%	77.7%	80.5%	81.8%	81.5%	75.4%
GPQA Diamond	66.1%	61.7%	61.6%	68.7%	73.9%	73.3%	64.3%
Humanity's Last Exam	10.6%	8.5%	9.8%	7.0%	10.5%	6.8%	5.6%
LiveCodeBench	69.1%	72.1%	66.0%	61.6%	74.7%	68.4%	51.9%
AIME 2025	71.9%	61.7%	72.3%	61.3%	80.0%	63.0%	75.0%
IFBench	50.4%	60.5%	41.5%	37.1%	36.3%	44.0%	39.2%
AA-Index	51.8%	49.0%	42.3%	43.3%	50.7%	49.5%	46.8%

Bold: Best performance, Italics: Second best

Model Specifications

Parameters: 21 billion
Base Model: Qwen3 14B with extended layers
Context Window: 40,000 tokens (expandable to 128K with YARN)
Precision: 8-bit optimized
Architecture: Extended Qwen transformer
Languages: 23+ languages with strong multilingual support
Knowledge Cutoff: October 2024

Hardware Requirements

Minimum Requirements

GPU: RTX 3090 (24GB VRAM) or RTX 4090
Mac: 32GB unified memory (Apple Silicon)
RAM: 32GB system memory
Storage: 25GB available space

Recommended Setup

GPU: RTX 4090 or A100 (40GB)
CPU: Modern multi-core processor
RAM: 64GB+ for optimal performance
Storage: NVMe SSD for faster loading

Installation & Usage

Quick Start with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/open0-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Generate response
prompt = "Write a Python function to implement binary search with detailed comments."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_length=1024, 
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using with vLLM

from vllm import LLM, SamplingParams

# Initialize model
llm = LLM(
    model="aquigpt/open0-2",
    tensor_parallel_size=1,
    trust_remote_code=True
)

# Set sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Generate
prompts = ["Explain quantum computing in simple terms."]
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)

Context Extension with YARN

# Enable YARN scaling for longer contexts
model = AutoModelForCausalLM.from_pretrained(
    "aquigpt/open0-2",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    rope_scaling={
        "type": "yarn",
        "factor": 3.2,  # Extends to ~128K tokens
    }
)

Use Cases

Advanced Reasoning & Mathematics

Complex mathematical problem solving (AIME 2025: 71.9%)
Scientific reasoning and analysis
Multi-step logical reasoning
Academic research assistance

Code Generation & Programming

Algorithm implementation and optimization
Code review and debugging
Technical documentation
Live coding challenges (LiveCodeBench: 69.1%)

Professional Applications

Research and analysis
Technical writing
Multilingual communication
Educational tutoring with detailed explanations

Quantization Options

Available quantization formats for different hardware setups:

BF16: ~42GB VRAM (full precision)
FP16: ~42GB VRAM (recommended)
INT8: ~21GB VRAM (efficient)
INT4: ~11GB VRAM (consumer hardware)

Fine-tuning Support

Aqui-open0-2 supports various fine-tuning approaches:

LoRA/QLoRA: Parameter-efficient fine-tuning
Full Fine-tuning: Complete model adaptation
Custom Tokenizer: Domain-specific vocabulary
Multi-task Learning: Specialized task combinations

Comparison with Closed Models

Aqui-open0-2 approaches the performance of our proprietary models:

Aqui-v2-0 tiny: Matches or exceeds on most benchmarks
Aqui-v2-0: Competitive performance at fraction of the size
Cost Efficiency: Open weights eliminate API costs
Customization: Full model access for specialized needs

Limitations

Knowledge cutoff at October 2024
May occasionally produce hallucinations
Requires significant computational resources for optimal performance
8-bit precision may impact some edge cases
Context extension reduces efficiency

License

This model is released under the Apache 2.0 License, enabling both research and commercial applications without restrictions.

Ethical Considerations

Aqui-open0-2 is designed for beneficial applications. Users should:

Implement appropriate safety measures for production use
Consider bias mitigation in sensitive applications
Follow responsible AI practices
Respect applicable laws and regulations

Support & Community

Repository: Hugging Face Model Page
Discussions: Join community discussions on Hugging Face

Acknowledgments

Qwen Team: built the base model, Qwen3 14B;
DeepSeek Team: synthetic dataset used for training the model was made with R1;
HuggingFace: hosting the model weights.

aquigpt
/

open0-2