v0.1-2x2-stage002 / README.md
aquiffoo's picture
Update README.md
0bff83e verified
metadata
license: apache-2.0
datasets:
  - HuggingFaceH4/ultrachat_200k
  - yahma/alpaca-cleaned
language:
  - en
pipeline_tag: text-generation
tags:
  - mesh
  - moe
  - mesh-labs
  - alpha
  - preview
  - research
  - experiment
  - routing
  - innovative
  - innovation
  - mesh-moe
  - custom_code
new_version: mesh-labs/v0.1-2x2-stage003

Mesh-v0.1-2x2 (Stage 002)

image/png

Introducing mesh

This is our first ever model! Allow us to explain how the mesh architecture works in detail.

  • Neural Mesh extends the concept of Mixture of Experts by allowing bidirectional expert communication.

  • The experts are shared in a bidimensional grid (2x2, 4x4, etc.) layout, that allows for them to communicate with their neighbors using the "Neighbor Exchange" method.

  • Just like MoE models, Mesh models have dynamic routing, and through the routing_k parameter you can define the amount of active parameters. For this model (2x2):

    • top-1 routing: 173M active parameters
    • top-2 routing: 242M active parameters (default)
    • dense routing: 302M active parameters

Here's how the mesh architecture works:

image/png

Evaluation

Disclaimer

This small language model is just a proof-of-concept, paving the way to the final release, which is likely to happen in Q4 2025, and include more models and better support from external libraries such as Transformers and Llama.cpp.