Aqui-open0-2: SOTA 21B Open Weights Reasoning Model

image/png

Aqui-open0-2 is a state-of-the-art 21 billion parameter open weights reasoning model from Aqui Solutions, creators of AquiGPT. Built on Qwen3 14B and extended with additional layers, this model delivers exceptional coding and reasoning performance that rivals much larger models while remaining accessible to the open-source community.

Key Features

  • Extended Architecture: 21B parameters with layers added to Qwen3 14B base
  • SOTA Performance: Competitive with larger proprietary and open models
  • 8-bit Precision: Optimized for efficiency without sacrificing quality
  • 40K Context Window: Expandable to 128K using YARN scaling
  • Strong Reasoning: Approaches performance of closed Aqui-v2-0 models
  • Open Weights: Fully open under Apache 2.0 license

Performance Benchmarks

Aqui-open0-2 demonstrates exceptional performance across multiple challenging benchmarks:

Benchmark Aqui-open0-2 (21B) gpt-oss (21.5B) Qwen3 (30.5B) Solar Pro 2 (30.9B) EXAONE 4.0 (32B) GLM-4.5 Air (110B) Aqui-v2-0 tiny
MMLU-Pro 79.8% 73.6% 77.7% 80.5% 81.8% 81.5% 75.4%
GPQA Diamond 66.1% 61.7% 61.6% 68.7% 73.9% 73.3% 64.3%
Humanity's Last Exam 10.6% 8.5% 9.8% 7.0% 10.5% 6.8% 5.6%
LiveCodeBench 69.1% 72.1% 66.0% 61.6% 74.7% 68.4% 51.9%
AIME 2025 71.9% 61.7% 72.3% 61.3% 80.0% 63.0% 75.0%
IFBench 50.4% 60.5% 41.5% 37.1% 36.3% 44.0% 39.2%
AA-Index 51.8% 49.0% 42.3% 43.3% 50.7% 49.5% 46.8%

Bold: Best performance, Italics: Second best

Model Specifications

  • Parameters: 21 billion
  • Base Model: Qwen3 14B with extended layers
  • Context Window: 40,000 tokens (expandable to 128K with YARN)
  • Precision: 8-bit optimized
  • Architecture: Extended Qwen transformer
  • Languages: 23+ languages with strong multilingual support
  • Knowledge Cutoff: October 2024

Hardware Requirements

Minimum Requirements

  • GPU: RTX 3090 (24GB VRAM) or RTX 4090
  • Mac: 32GB unified memory (Apple Silicon)
  • RAM: 32GB system memory
  • Storage: 25GB available space

Recommended Setup

  • GPU: RTX 4090 or A100 (40GB)
  • CPU: Modern multi-core processor
  • RAM: 64GB+ for optimal performance
  • Storage: NVMe SSD for faster loading

Installation & Usage

Quick Start with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/open0-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Generate response
prompt = "Write a Python function to implement binary search with detailed comments."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_length=1024, 
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using with vLLM

from vllm import LLM, SamplingParams

# Initialize model
llm = LLM(
    model="aquigpt/open0-2",
    tensor_parallel_size=1,
    trust_remote_code=True
)

# Set sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Generate
prompts = ["Explain quantum computing in simple terms."]
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)

Context Extension with YARN

# Enable YARN scaling for longer contexts
model = AutoModelForCausalLM.from_pretrained(
    "aquigpt/open0-2",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    rope_scaling={
        "type": "yarn",
        "factor": 3.2,  # Extends to ~128K tokens
    }
)

Use Cases

Advanced Reasoning & Mathematics

  • Complex mathematical problem solving (AIME 2025: 71.9%)
  • Scientific reasoning and analysis
  • Multi-step logical reasoning
  • Academic research assistance

Code Generation & Programming

  • Algorithm implementation and optimization
  • Code review and debugging
  • Technical documentation
  • Live coding challenges (LiveCodeBench: 69.1%)

Professional Applications

  • Research and analysis
  • Technical writing
  • Multilingual communication
  • Educational tutoring with detailed explanations

Quantization Options

Available quantization formats for different hardware setups:

  • BF16: ~42GB VRAM (full precision)
  • FP16: ~42GB VRAM (recommended)
  • INT8: ~21GB VRAM (efficient)
  • INT4: ~11GB VRAM (consumer hardware)

Fine-tuning Support

Aqui-open0-2 supports various fine-tuning approaches:

  • LoRA/QLoRA: Parameter-efficient fine-tuning
  • Full Fine-tuning: Complete model adaptation
  • Custom Tokenizer: Domain-specific vocabulary
  • Multi-task Learning: Specialized task combinations

Comparison with Closed Models

Aqui-open0-2 approaches the performance of our proprietary models:

  • Aqui-v2-0 tiny: Matches or exceeds on most benchmarks
  • Aqui-v2-0: Competitive performance at fraction of the size
  • Cost Efficiency: Open weights eliminate API costs
  • Customization: Full model access for specialized needs

Limitations

  • Knowledge cutoff at October 2024
  • May occasionally produce hallucinations
  • Requires significant computational resources for optimal performance
  • 8-bit precision may impact some edge cases
  • Context extension reduces efficiency

License

This model is released under the Apache 2.0 License, enabling both research and commercial applications without restrictions.

Ethical Considerations

Aqui-open0-2 is designed for beneficial applications. Users should:

  • Implement appropriate safety measures for production use
  • Consider bias mitigation in sensitive applications
  • Follow responsible AI practices
  • Respect applicable laws and regulations

Support & Community

Acknowledgments

  • Qwen Team: built the base model, Qwen3 14B;
  • DeepSeek Team: synthetic dataset used for training the model was made with R1;
  • HuggingFace: hosting the model weights.

Copyright 2025 Aqui Solutions. All rights reserved.

Downloads last month
20
Safetensors
Model size
21B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aquigpt/open0-2

Finetuned
Qwen/Qwen3-14B
Quantized
(106)
this model

Collection including aquigpt/open0-2