Distil PII Redaction Collection A family of small language models (SLMs) specialized for policy-aware PII redaction that can run locally. • 7 items • Updated Feb 6 • 10
Mistral Small 4 Collection A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated Mar 16 • 66
DroPE Collection Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding (https://www.arxiv.org/abs/2512.12167) • 1 item • Updated Jan 11 • 3
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated Feb 26 • 96
ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated 11 days ago • 21
GLiClass-Instruct Collection Multi-task efficient zero-shot sequence classification models • 3 items • Updated Feb 17 • 5
jina-embeddings-v5-text Collection Our 5th-gen embeddings: two lightweight multilingual models with SOTA performance in retrieval, matching, clustering, and classification. • 29 items • Updated Feb 27 • 38
view article Article Firecracker vs Docker: The Technical Boundary Between MicroVMs and Containers Nov 6, 2025 • 3
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling Feb 12 • 53
NeuTTS Air Collection NeuTTS Air is a speech foundation model that runs on CPU in real-time, with instant voice cloning. • 3 items • Updated Feb 12 • 21
NeuTTS Nano Multilingual Collection Collection NeuTTS Nano is a TTS model, 3x smaller than NeuTTS Air, that runs on CPU in real-time - now in English, Spanish, French, and German versions! • 13 items • Updated Feb 26 • 17
mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9, 2025 • 53
Falcon-H1-Tiny Collection A series of extremely small, yet powerful language models redefining capabilities at small scale • 19 items • Updated Mar 2 • 37
VibeVoice Collection Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 8 items • Updated Mar 2 • 229
Tarka Embed V1 Collection Efficient DFKD embeddings for language understanding • 5 items • Updated Dec 17, 2025 • 6