Mesh-v0.1-2x2 (Stage 002)
Introducing mesh
This is our first ever model! Allow us to explain how the mesh
architecture works in detail.
Neural Mesh extends the concept of Mixture of Experts by allowing bidirectional expert communication.
The experts are shared in a bidimensional grid (2x2, 4x4, etc.) layout, that allows for them to communicate with their neighbors using the "Neighbor Exchange" method.
Just like MoE models, Mesh models have dynamic routing, and through the
routing_k
parameter you can define the amount of active parameters. For this model (2x2):- top-1 routing: 173M active parameters
- top-2 routing: 242M active parameters (default)
- dense routing: 302M active parameters
Here's how the mesh architecture works:
Evaluation

Disclaimer
This small language model is just a proof-of-concept, paving the way to the final release, which is likely to happen in Q4 2025, and include more models and better support from external libraries such as Transformers and Llama.cpp.
- Downloads last month
- 4