A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers.
Create images in seconds. No sign-up, no paywall, no setup.