File size: 2,034 Bytes
0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 0422df2 7a237f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
datasets:
- HuggingFaceFW/fineweb
---
# 🧠 Quasar-V4-Tiny (Base)
**Model ID:** `silx-ai/Quasar-V4-Tiny`
**Architecture:** Linear Attention with Kernel Feature Maps
**Developed by:** SILX AI
**Powered by:** [gputrader.io](https://gputrader.io)
---
## 📝 Description
`Quasar-V4-Tiny` is a minimal, experimental language model designed to test a new **Linear Attention mechanism** using **Kernel Feature Maps**.
This model discards traditional softmax-based self-attention in favor of a more efficient, scalable alternative.
It represents the **first fully working prototype** of the Quasar architecture and is trained on a small-scale dataset for initial validation of functionality and tokenization.
---
## 📊 Training Details
- **Training objective:** Causal Language Modeling (next-token prediction)
- **Training tokens:** ~1–2 billion
- **Architecture:** Linear Attention with Kernel Feature Maps
- **Batch size:** Small, due to limited compute
- **Training duration:** Short, meant to verify architecture behavior and convergence
---
## ⚠️ Limitations
- Not trained for quality or coherence — purely experimental
- Likely to hallucinate, generate irrelevant text, or be inconsistent
- **Do not use in production** — this is a base model meant for architecture-level debugging and early development
---
## 🙏 Acknowledgements
This project was made possible thanks to compute provided by **[gputrader.io](https://gputrader.io)**.
Their support enabled fast iteration during early-stage experimentation.
---
## 🔬 Research Goals
This model is part of an ongoing effort to:
- Replace traditional transformer attention with linear, scalable attention
- Build more efficient foundation models under constrained resources
- Explore custom architectures that can be trained with minimal GPU power
More versions (medium, scaled, improved) are expected after full validation of the Quasar pipeline.
---
## 📎 License
This model is released for **research and testing purposes only**. |