🧠 Quasar-V4-Tiny (Base)

Model ID: silx-ai/Quasar-V4-Tiny
Architecture: Linear Attention with Kernel Feature Maps
Developed by: SILX AI
Powered by: gputrader.io


📝 Description

Quasar-V4-Tiny is a minimal, experimental language model designed to test a new Linear Attention mechanism using Kernel Feature Maps.
This model discards traditional softmax-based self-attention in favor of a more efficient, scalable alternative.

It represents the first fully working prototype of the Quasar architecture and is trained on a small-scale dataset for initial validation of functionality and tokenization.


📊 Training Details

  • Training objective: Causal Language Modeling (next-token prediction)
  • Training tokens: ~1–2 billion
  • Architecture: Linear Attention with Kernel Feature Maps
  • Batch size: Small, due to limited compute
  • Training duration: Short, meant to verify architecture behavior and convergence

⚠️ Limitations

  • Not trained for quality or coherence — purely experimental
  • Likely to hallucinate, generate irrelevant text, or be inconsistent
  • Do not use in production — this is a base model meant for architecture-level debugging and early development

🙏 Acknowledgements

This project was made possible thanks to compute provided by gputrader.io.
Their support enabled fast iteration during early-stage experimentation.


🔬 Research Goals

This model is part of an ongoing effort to:

  • Replace traditional transformer attention with linear, scalable attention
  • Build more efficient foundation models under constrained resources
  • Explore custom architectures that can be trained with minimal GPU power

More versions (medium, scaled, improved) are expected after full validation of the Quasar pipeline.


📎 License

This model is released for research and testing purposes only.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for silx-ai/QuasarV4-Tiny

Finetunes
1 model

Dataset used to train silx-ai/QuasarV4-Tiny