QuasarV4-Tiny / README.md
eyad-silx's picture
Update README.md
7a237f3 verified
---
datasets:
- HuggingFaceFW/fineweb
---
# 🧠 Quasar-V4-Tiny (Base)
**Model ID:** `silx-ai/Quasar-V4-Tiny`
**Architecture:** Linear Attention with Kernel Feature Maps
**Developed by:** SILX AI
**Powered by:** [gputrader.io](https://gputrader.io)
---
## 📝 Description
`Quasar-V4-Tiny` is a minimal, experimental language model designed to test a new **Linear Attention mechanism** using **Kernel Feature Maps**.
This model discards traditional softmax-based self-attention in favor of a more efficient, scalable alternative.
It represents the **first fully working prototype** of the Quasar architecture and is trained on a small-scale dataset for initial validation of functionality and tokenization.
---
## 📊 Training Details
- **Training objective:** Causal Language Modeling (next-token prediction)
- **Training tokens:** ~1–2 billion
- **Architecture:** Linear Attention with Kernel Feature Maps
- **Batch size:** Small, due to limited compute
- **Training duration:** Short, meant to verify architecture behavior and convergence
---
## ⚠️ Limitations
- Not trained for quality or coherence — purely experimental
- Likely to hallucinate, generate irrelevant text, or be inconsistent
- **Do not use in production** — this is a base model meant for architecture-level debugging and early development
---
## 🙏 Acknowledgements
This project was made possible thanks to compute provided by **[gputrader.io](https://gputrader.io)**.
Their support enabled fast iteration during early-stage experimentation.
---
## 🔬 Research Goals
This model is part of an ongoing effort to:
- Replace traditional transformer attention with linear, scalable attention
- Build more efficient foundation models under constrained resources
- Explore custom architectures that can be trained with minimal GPU power
More versions (medium, scaled, improved) are expected after full validation of the Quasar pipeline.
---
## 📎 License
This model is released for **research and testing purposes only**.