Mesh-v0.1-2x2 (Stage 002)

image/png

Introducing mesh

This is our first ever model! Allow us to explain how the mesh architecture works in detail.

  • Neural Mesh extends the concept of Mixture of Experts by allowing bidirectional expert communication.

  • The experts are shared in a bidimensional grid (2x2, 4x4, etc.) layout, that allows for them to communicate with their neighbors using the "Neighbor Exchange" method.

  • Just like MoE models, Mesh models have dynamic routing, and through the routing_k parameter you can define the amount of active parameters. For this model (2x2):

    • top-1 routing: 173M active parameters
    • top-2 routing: 242M active parameters (default)
    • dense routing: 302M active parameters

Here's how the mesh architecture works:

image/png

Evaluation

Disclaimer

This small language model is just a proof-of-concept, paving the way to the final release, which is likely to happen in Q4 2025, and include more models and better support from external libraries such as Transformers and Llama.cpp.

Downloads last month
4
Safetensors
Model size
420M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train mesh-labs/v0.1-2x2-stage002

Collection including mesh-labs/v0.1-2x2-stage002