Aqui-open0-2 Lite: Efficient 1.72B Open Weights Reasoning Model
Aqui-open0-2 Lite is a compact yet powerful 1.72 billion parameter open weights reasoning model from Aqui Solutions, creators of AquiGPT. Fine-tuned on Qwen3 1.7B, this model delivers exceptional performance that rivals much larger models while being highly accessible for consumer hardware and edge deployment.
Key Features
- Compact Architecture: 1.72B parameters fine-tuned on Qwen3 1.7B base
- Outstanding Performance: Competitive with larger models in key benchmarks
- 8-bit Precision: Optimized for efficiency without sacrificing quality
- 40K Context Window: Expandable to 128K using YARN scaling
- Strong Reasoning: Exceptional performance in instruction following and multilingual tasks
- Open Weights: Fully open under Apache 2.0 license
- Consumer-Friendly: Runs on modest hardware setups
Performance Benchmarks
Aqui-open0-2 Lite demonstrates exceptional performance across multiple challenging benchmarks, significantly outperforming other models in its size class:
Benchmark | Aqui-open0-2 Lite (1.72B) | Gemma 3 (1B) | Qwen3 (2.03B) | Llama 3.2 (1.24B) | LFM2 (1.17B) |
---|---|---|---|---|---|
MMLU (General Knowledge) | 67.5% | 40.1% | 59.1% | 46.6% | 55.2% |
GPQA (Science) | 31.8% | 19.2% | 27.7% | 19.6% | 31.5% |
IFEval (Instruction Following) | 73.4% | 62.9% | 68.4% | 52.4% | 74.5% |
GSM8K (Grade School Math) | 63.2% | 59.6% | 51.4% | 35.7% | 58.3% |
MGSM (Multilingual) | 70.2% | 43.6% | 66.6% | 29.1% | 55.0% |
Average Performance | 61.2% | 45.1% | 54.6% | 36.7% | 54.9% |
Bold: Best performance, Italics: Second best
Model Specifications
- Parameters: 1.72 billion
- Base Model: Qwen3 1.7B
- Context Window: 40,000 tokens (expandable to 128K with YARN)
- Precision: 8-bit optimized
- Architecture: Qwen transformer
- Languages: 23+ languages with strong multilingual support
- Knowledge Cutoff: October 2024
Hardware Requirements
Minimum Requirements
- GPU: GTX 1660 (6GB VRAM) or RTX 3060
- Mac: 8GB unified memory (Apple Silicon)
- RAM: 8GB system memory
- Storage: 4GB available space
Recommended Setup
- GPU: RTX 3070 or RTX 4060 (8GB+)
- CPU: Modern quad-core processor
- RAM: 16GB+ for optimal performance
- Storage: NVMe SSD for faster loading
Edge Deployment
- Mobile: Capable of running on high-end mobile devices
- Raspberry Pi: Compatible with Pi 5 with sufficient RAM
- Embedded: Suitable for edge AI applications
Installation & Usage
Quick Start with Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "aquigpt/open0-2-lite"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Generate response
prompt = "Write a Python function to implement binary search with detailed comments."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=1024,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using with vLLM
from vllm import LLM, SamplingParams
# Initialize model
llm = LLM(
model="aquigpt/open0-2-lite",
tensor_parallel_size=1,
trust_remote_code=True
)
# Set sampling parameters
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=512
)
# Generate
prompts = ["Explain quantum computing in simple terms."]
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)
Use Cases
Educational & Learning
- Grade school mathematics assistance (GSM8K: 63.2%)
- General knowledge queries (MMLU: 67.5%)
- Multilingual learning support (MGSM: 70.2%)
- Instruction following for educational tasks
Lightweight Development
- Code generation for simple to moderate tasks
- Algorithm implementation
- Code review and debugging
- Technical documentation
Edge AI Applications
- On-device assistance
- Offline reasoning tasks
- Mobile app integration
- IoT and embedded systems
Multilingual Support
- Cross-language communication
- Translation assistance
- Multilingual content creation
- Cultural context understanding
Quantization Options
Available quantization formats for different hardware setups:
- BF16: ~3.4GB VRAM (full precision)
- FP16: ~3.4GB VRAM (recommended)
- INT8: ~1.7GB VRAM (efficient)
- INT4: ~0.9GB VRAM (ultra-efficient for edge)
Fine-tuning Support
Aqui-open0-2 Lite supports various fine-tuning approaches:
- LoRA/QLoRA: Parameter-efficient fine-tuning
- Full Fine-tuning: Complete model adaptation
- Custom Tokenizer: Domain-specific vocabulary
- Multi-task Learning: Specialized task combinations
Comparison with Other Small Models
Aqui-open0-2 Lite significantly outperforms other models in its size class:
- 67.5% MMLU: vs 59.1% (Qwen3 2B) and 55.2% (LFM2)
- 73.4% IFEval: Leading instruction following performance
- 70.2% MGSM: Superior multilingual capabilities
- Efficiency: Best performance per parameter in class
Limitations
- Knowledge cutoff at October 2024
- May occasionally produce hallucinations
- Limited compared to larger models for highly complex reasoning
- 8-bit precision may impact some edge cases
- Context extension reduces efficiency
License
This model is released under the Apache 2.0 License, enabling both research and commercial applications without restrictions.
Ethical Considerations
Aqui-open0-2 Lite is designed for beneficial applications. Users should:
- Implement appropriate safety measures for production use
- Consider bias mitigation in sensitive applications
- Follow responsible AI practices
- Respect applicable laws and regulations
Support & Community
- Repository: Hugging Face Model Page
- Discussions: Join community discussions on Hugging Face
Acknowledgments
- Qwen Team: built the base model, Qwen3 1.7B;
- HuggingFace: hosting the model weights.
Copyright 2025 Aqui Solutions. All rights reserved.
- Downloads last month
- 28