🧠 Quasar-V4-Tiny (Base)

Model ID: silx-ai/Quasar-V4-Tiny
Architecture: Linear Attention with Kernel Feature Maps
Developed by: SILX AI
Powered by: gputrader.io

📝 Description

Quasar-V4-Tiny is a minimal, experimental language model designed to test a new Linear Attention mechanism using Kernel Feature Maps.
This model discards traditional softmax-based self-attention in favor of a more efficient, scalable alternative.

It represents the first fully working prototype of the Quasar architecture and is trained on a small-scale dataset for initial validation of functionality and tokenization.

📊 Training Details

Training objective: Causal Language Modeling (next-token prediction)
Training tokens: ~1–2 billion
Architecture: Linear Attention with Kernel Feature Maps
Batch size: Small, due to limited compute
Training duration: Short, meant to verify architecture behavior and convergence

⚠️ Limitations

Not trained for quality or coherence — purely experimental
Likely to hallucinate, generate irrelevant text, or be inconsistent
Do not use in production — this is a base model meant for architecture-level debugging and early development

🙏 Acknowledgements

This project was made possible thanks to compute provided by gputrader.io.
Their support enabled fast iteration during early-stage experimentation.

🔬 Research Goals

This model is part of an ongoing effort to:

Replace traditional transformer attention with linear, scalable attention
Build more efficient foundation models under constrained resources
Explore custom architectures that can be trained with minimal GPU power

More versions (medium, scaled, improved) are expected after full validation of the Quasar pipeline.

📎 License

This model is released for research and testing purposes only.

silx-ai
/

QuasarV4-Tiny