|
--- |
|
datasets: |
|
- HuggingFaceFW/fineweb |
|
--- |
|
# 🧠 Quasar-V4-Tiny (Base) |
|
|
|
**Model ID:** `silx-ai/Quasar-V4-Tiny` |
|
**Architecture:** Linear Attention with Kernel Feature Maps |
|
**Developed by:** SILX AI |
|
**Powered by:** [gputrader.io](https://gputrader.io) |
|
|
|
--- |
|
|
|
## 📝 Description |
|
|
|
`Quasar-V4-Tiny` is a minimal, experimental language model designed to test a new **Linear Attention mechanism** using **Kernel Feature Maps**. |
|
This model discards traditional softmax-based self-attention in favor of a more efficient, scalable alternative. |
|
|
|
It represents the **first fully working prototype** of the Quasar architecture and is trained on a small-scale dataset for initial validation of functionality and tokenization. |
|
|
|
--- |
|
|
|
## 📊 Training Details |
|
|
|
- **Training objective:** Causal Language Modeling (next-token prediction) |
|
- **Training tokens:** ~1–2 billion |
|
- **Architecture:** Linear Attention with Kernel Feature Maps |
|
- **Batch size:** Small, due to limited compute |
|
- **Training duration:** Short, meant to verify architecture behavior and convergence |
|
|
|
--- |
|
|
|
## ⚠️ Limitations |
|
|
|
- Not trained for quality or coherence — purely experimental |
|
- Likely to hallucinate, generate irrelevant text, or be inconsistent |
|
- **Do not use in production** — this is a base model meant for architecture-level debugging and early development |
|
|
|
--- |
|
|
|
## 🙏 Acknowledgements |
|
|
|
This project was made possible thanks to compute provided by **[gputrader.io](https://gputrader.io)**. |
|
Their support enabled fast iteration during early-stage experimentation. |
|
|
|
--- |
|
|
|
## 🔬 Research Goals |
|
|
|
This model is part of an ongoing effort to: |
|
|
|
- Replace traditional transformer attention with linear, scalable attention |
|
- Build more efficient foundation models under constrained resources |
|
- Explore custom architectures that can be trained with minimal GPU power |
|
|
|
More versions (medium, scaled, improved) are expected after full validation of the Quasar pipeline. |
|
|
|
--- |
|
|
|
## 📎 License |
|
|
|
This model is released for **research and testing purposes only**. |