metadata

license: apache-2.0
datasets:
  - HuggingFaceH4/ultrachat_200k
  - yahma/alpaca-cleaned
language:
  - en
pipeline_tag: text-generation
tags:
  - mesh
  - moe
  - mesh-labs
  - alpha
  - preview
  - research
  - experiment
  - routing
  - innovative
  - innovation
  - mesh-moe
  - custom_code
new_version: mesh-labs/v0.1-2x2-stage003

Mesh-v0.1-2x2 (Stage 002)

Introducing mesh

This is our first ever model! Allow us to explain how the mesh architecture works in detail.

Neural Mesh extends the concept of Mixture of Experts by allowing bidirectional expert communication.
The experts are shared in a bidimensional grid (2x2, 4x4, etc.) layout, that allows for them to communicate with their neighbors using the "Neighbor Exchange" method.
Just like MoE models, Mesh models have dynamic routing, and through the routing_k parameter you can define the amount of active parameters. For this model (2x2):
- top-1 routing: 173M active parameters
- top-2 routing: 242M active parameters (default)
- dense routing: 302M active parameters

Here's how the mesh architecture works:

Evaluation

Disclaimer

This small language model is just a proof-of-concept, paving the way to the final release, which is likely to happen in Q4 2025, and include more models and better support from external libraries such as Transformers and Llama.cpp.