silx-ai
/

QuasarV4-Tiny

infinity_former

Model card Files Files and versions

QuasarV4-Tiny / README.md

eyad-silx's picture

Update README.md

7a237f3 verified 3 months ago

|

history blame contribute delete

2.03 kB

	---
	datasets:
	- HuggingFaceFW/fineweb
	---
	# 🧠 Quasar-V4-Tiny (Base)

	Model ID: `silx-ai/Quasar-V4-Tiny`
	Architecture: Linear Attention with Kernel Feature Maps
	Developed by: SILX AI
	Powered by: [gputrader.io](https://gputrader.io)

	---

	## 📝 Description

	`Quasar-V4-Tiny` is a minimal, experimental language model designed to test a new Linear Attention mechanism using Kernel Feature Maps.
	This model discards traditional softmax-based self-attention in favor of a more efficient, scalable alternative.

	It represents the first fully working prototype of the Quasar architecture and is trained on a small-scale dataset for initial validation of functionality and tokenization.

	---

	## 📊 Training Details

	- Training objective: Causal Language Modeling (next-token prediction)
	- Training tokens: ~1–2 billion
	- Architecture: Linear Attention with Kernel Feature Maps
	- Batch size: Small, due to limited compute
	- Training duration: Short, meant to verify architecture behavior and convergence

	---

	## ⚠️ Limitations

	- Not trained for quality or coherence — purely experimental
	- Likely to hallucinate, generate irrelevant text, or be inconsistent
	- Do not use in production — this is a base model meant for architecture-level debugging and early development

	---

	## 🙏 Acknowledgements

	This project was made possible thanks to compute provided by [gputrader.io](https://gputrader.io).
	Their support enabled fast iteration during early-stage experimentation.

	---

	## 🔬 Research Goals

	This model is part of an ongoing effort to:

	- Replace traditional transformer attention with linear, scalable attention
	- Build more efficient foundation models under constrained resources
	- Explore custom architectures that can be trained with minimal GPU power

	More versions (medium, scaled, improved) are expected after full validation of the Quasar pipeline.

	---

	## 📎 License

	This model is released for research and testing purposes only.