| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						license: apache-2.0 | 
					
					
						
						| 
							 | 
						datasets: | 
					
					
						
						| 
							 | 
						- HuggingFaceH4/ultrachat_200k | 
					
					
						
						| 
							 | 
						language: | 
					
					
						
						| 
							 | 
						- en | 
					
					
						
						| 
							 | 
						pipeline_tag: text-generation | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- mesh | 
					
					
						
						| 
							 | 
						- moe | 
					
					
						
						| 
							 | 
						- mesh-labs | 
					
					
						
						| 
							 | 
						- alpha | 
					
					
						
						| 
							 | 
						- preview | 
					
					
						
						| 
							 | 
						- research | 
					
					
						
						| 
							 | 
						- experiment | 
					
					
						
						| 
							 | 
						- routing | 
					
					
						
						| 
							 | 
						- innovative | 
					
					
						
						| 
							 | 
						- innovation | 
					
					
						
						| 
							 | 
						- mesh-moe | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Mesh-v0.1-2x2 (Stage 001) | 
					
					
						
						| 
							 | 
						<small>Currently, the model is only capable of generating gibberish. This will be fixed until the final release.</small> | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Introducing mesh | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						This is our first ever model! Allow us to explain how the `mesh` architecture works in detail. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- Neural Mesh extends the concept of Mixture of Experts by allowing bidirectional expert communication. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- The experts are shared in a bidimensional grid (2x2, 4x4, etc.) layout, that allows for them to communicate with their neighbors using the "Neighbor Exchange" method. | 
					
					
						
						| 
							 | 
						- Just like MoE models, Mesh models have dynamic routing, and through the `routing_k` parameter you can define the amount of active parameters. For this model (2x2): | 
					
					
						
						| 
							 | 
						  - top-1 routing: 173M active parameters | 
					
					
						
						| 
							 | 
						  - top-2 routing: 242M active parameters (default) | 
					
					
						
						| 
							 | 
						  - dense routing: 302M active parameters | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Here's how the mesh architecture works: | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Disclaimer | 
					
					
						
						| 
							 | 
						This small language model is just a proof-of-concept, paving the way to the final release, which is likely to happen in Q4 2025, and include more models and better support from external libraries such as Transformers and Llama.cpp. | 
					
					
						
						| 
							 | 
						
 |