Aqui-open0-2 Lite: Efficient 1.72B Open Weights Reasoning Model

Aqui-open0-2 Lite is a compact yet powerful 1.72 billion parameter open weights reasoning model from Aqui Solutions, creators of AquiGPT. Fine-tuned on Qwen3 1.7B, this model delivers exceptional performance that rivals much larger models while being highly accessible for consumer hardware and edge deployment.

Key Features

Compact Architecture: 1.72B parameters fine-tuned on Qwen3 1.7B base
Outstanding Performance: Competitive with larger models in key benchmarks
8-bit Precision: Optimized for efficiency without sacrificing quality
40K Context Window: Expandable to 128K using YARN scaling
Strong Reasoning: Exceptional performance in instruction following and multilingual tasks
Open Weights: Fully open under Apache 2.0 license
Consumer-Friendly: Runs on modest hardware setups

Performance Benchmarks

Aqui-open0-2 Lite demonstrates exceptional performance across multiple challenging benchmarks, significantly outperforming other models in its size class:

Benchmark	Aqui-open0-2 Lite (1.72B)	Gemma 3 (1B)	Qwen3 (2.03B)	Llama 3.2 (1.24B)	LFM2 (1.17B)
MMLU (General Knowledge)	67.5%	40.1%	59.1%	46.6%	55.2%
GPQA (Science)	31.8%	19.2%	27.7%	19.6%	31.5%
IFEval (Instruction Following)	73.4%	62.9%	68.4%	52.4%	74.5%
GSM8K (Grade School Math)	63.2%	59.6%	51.4%	35.7%	58.3%
MGSM (Multilingual)	70.2%	43.6%	66.6%	29.1%	55.0%
Average Performance	61.2%	45.1%	54.6%	36.7%	54.9%

Bold: Best performance, Italics: Second best

Model Specifications

Parameters: 1.72 billion
Base Model: Qwen3 1.7B
Context Window: 40,000 tokens (expandable to 128K with YARN)
Precision: 8-bit optimized
Architecture: Qwen transformer
Languages: 23+ languages with strong multilingual support
Knowledge Cutoff: October 2024

Hardware Requirements

Minimum Requirements

GPU: GTX 1660 (6GB VRAM) or RTX 3060
Mac: 8GB unified memory (Apple Silicon)
RAM: 8GB system memory
Storage: 4GB available space

Recommended Setup

GPU: RTX 3070 or RTX 4060 (8GB+)
CPU: Modern quad-core processor
RAM: 16GB+ for optimal performance
Storage: NVMe SSD for faster loading

Edge Deployment

Mobile: Capable of running on high-end mobile devices
Raspberry Pi: Compatible with Pi 5 with sufficient RAM
Embedded: Suitable for edge AI applications

Installation & Usage

Quick Start with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/open0-2-lite"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Generate response
prompt = "Write a Python function to implement binary search with detailed comments."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_length=1024, 
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using with vLLM

from vllm import LLM, SamplingParams

# Initialize model
llm = LLM(
    model="aquigpt/open0-2-lite",
    tensor_parallel_size=1,
    trust_remote_code=True
)

# Set sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Generate
prompts = ["Explain quantum computing in simple terms."]
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)

Use Cases

Educational & Learning

Grade school mathematics assistance (GSM8K: 63.2%)
General knowledge queries (MMLU: 67.5%)
Multilingual learning support (MGSM: 70.2%)
Instruction following for educational tasks

Lightweight Development

Code generation for simple to moderate tasks
Algorithm implementation
Code review and debugging
Technical documentation

Edge AI Applications

On-device assistance
Offline reasoning tasks
Mobile app integration
IoT and embedded systems

Multilingual Support

Cross-language communication
Translation assistance
Multilingual content creation
Cultural context understanding

Quantization Options

Available quantization formats for different hardware setups:

BF16: ~3.4GB VRAM (full precision)
FP16: ~3.4GB VRAM (recommended)
INT8: ~1.7GB VRAM (efficient)
INT4: ~0.9GB VRAM (ultra-efficient for edge)

Fine-tuning Support

Aqui-open0-2 Lite supports various fine-tuning approaches:

LoRA/QLoRA: Parameter-efficient fine-tuning
Full Fine-tuning: Complete model adaptation
Custom Tokenizer: Domain-specific vocabulary
Multi-task Learning: Specialized task combinations

Comparison with Other Small Models

Aqui-open0-2 Lite significantly outperforms other models in its size class:

67.5% MMLU: vs 59.1% (Qwen3 2B) and 55.2% (LFM2)
73.4% IFEval: Leading instruction following performance
70.2% MGSM: Superior multilingual capabilities
Efficiency: Best performance per parameter in class

Limitations

Knowledge cutoff at October 2024
May occasionally produce hallucinations
Limited compared to larger models for highly complex reasoning
8-bit precision may impact some edge cases
Context extension reduces efficiency

License

This model is released under the Apache 2.0 License, enabling both research and commercial applications without restrictions.

Ethical Considerations

Aqui-open0-2 Lite is designed for beneficial applications. Users should:

Implement appropriate safety measures for production use
Consider bias mitigation in sensitive applications
Follow responsible AI practices
Respect applicable laws and regulations

Support & Community

Repository: Hugging Face Model Page
Discussions: Join community discussions on Hugging Face

Acknowledgments

Qwen Team: built the base model, Qwen3 1.7B;
HuggingFace: hosting the model weights.

aquigpt
/

open0-2-lite