File size: 4,006 Bytes

---
language:
- en
license: mit
library_name: transformers
tags:
- pretrained
- 7B
- English
- text-generation
- base-model
- bittensor
- decentralized AI
- Web3
datasets:
- tiiuae/falcon-refinedweb
---


# 🏯 Sumo-Qyuu-7B-v0.1

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a8a4c5539e211436ef5485/RXiIpU1BwTpvUdhzv-XK9.png)

🏯 Sumo is a family of models developed by [Tensorplex](https://tensorplex.ai). Specifically, "Sumo-Qyuu" stands for the best model developed for the bittensor subnet 9.

## Model Details

### Model Description

- **Developed by:** [Tensorplex Labs](https://tensorplex.ai)
- **Model type:** Pretrained Foundational Language Model
- **Language(s) (NLP):** Primarily English
- **License:** MIT

### Model Sources

- **Bittensor Subnet9 Leaderboard:** [https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard](https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard)
- **Bittensor Subnet9 Repository:** [https://github.com/RaoFoundation/pretraining/tree/main](https://github.com/RaoFoundation/pretraining/tree/main)

## Usage

⛔ **This is a pretrained base model, which hasn't been aligned yet. Use with caution or finetune further on downstream tasks before deployment.**

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tensorplex-labs/Sumo-Qyuu-7B-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
)
sequences = pipeline(
   "What is Yokozuna?",
    max_length=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

```

## Training Details

### Training Data

This model has been trained with [tiiuae/falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.

## Evaluation

|                                  |      tensorplex-labs/Sumo-Qyuu-7B-v0.1 |   NousResearch/Llama-2-7b-hf |   yahma/llama-7b-hf |   tiiuae/falcon-7b |
|----------------------------------|----------------------------------------|------------------------------|---------------------|--------------------|
| **avg**                          |                            **47.85**   |                      47.31   |               44.22 |             42.03  |
| arc_challenge (acc_norm, 0-shot) |                                47.53   |                      46.16   |               44.88 |             43.43  |
| gsm8k (exact_match, 5-shot)      |                                10.46   |                      13.27   |               10.39 |             05.23  |
| hellaswag (acc_norm, 0-shot)     |                                76.66   |                      75.97   |               76.19 |             76.33  |
| mmlu (acc, 0-shot)               |                                44.26   |                      40.78   |               29.68 |             25.72  |
| truthfulqa_mc2 (acc, 0-shot)     |                                37.29   |                      39.00   |               34.01 |             34.27  |
| winogrande (acc, 0-shot)         |                                70.88   |                      68.67   |               70.17 |             67.17  |

[LM Evaluation Harness Repository](https://github.com/EleutherAI/lm-evaluation-harness)


## Model Architecture and Objective

The model is a 6.9B parameter model based on `LlamaForCausalLM` architecture, with larger vocab size (100k) that matches with the gpt-4 tokenizer.
The training objective was the standard next token prediction.

## Model Card Authors

- [email protected]

## Model Card Contact

Should you have any inquiries, contact [email protected].