File size: 4,006 Bytes
86bb7ae
 
 
94fe616
86bb7ae
 
 
 
 
 
 
3b8c381
 
 
94fe616
 
 
3b8c381
e73a037
3b8c381
 
c72b0b1
 
e73a037
3b8c381
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28efced
3b8c381
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
language:
- en
license: mit
library_name: transformers
tags:
- pretrained
- 7B
- English
- text-generation
- base-model
- bittensor
- decentralized AI
- Web3
datasets:
- tiiuae/falcon-refinedweb
---


# 🏯 Sumo-Qyuu-7B-v0.1

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a8a4c5539e211436ef5485/RXiIpU1BwTpvUdhzv-XK9.png)

🏯 Sumo is a family of models developed by [Tensorplex](https://tensorplex.ai). Specifically, "Sumo-Qyuu" stands for the best model developed for the bittensor subnet 9.

## Model Details

### Model Description

- **Developed by:** [Tensorplex Labs](https://tensorplex.ai)
- **Model type:** Pretrained Foundational Language Model
- **Language(s) (NLP):** Primarily English
- **License:** MIT

### Model Sources

- **Bittensor Subnet9 Leaderboard:** [https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard](https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard)
- **Bittensor Subnet9 Repository:** [https://github.com/RaoFoundation/pretraining/tree/main](https://github.com/RaoFoundation/pretraining/tree/main)

## Usage**This is a pretrained base model, which hasn't been aligned yet. Use with caution or finetune further on downstream tasks before deployment.**

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tensorplex-labs/Sumo-Qyuu-7B-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
)
sequences = pipeline(
   "What is Yokozuna?",
    max_length=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

```

## Training Details

### Training Data

This model has been trained with [tiiuae/falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.

## Evaluation

|                                  |      tensorplex-labs/Sumo-Qyuu-7B-v0.1 |   NousResearch/Llama-2-7b-hf |   yahma/llama-7b-hf |   tiiuae/falcon-7b |
|----------------------------------|----------------------------------------|------------------------------|---------------------|--------------------|
| **avg**                          |                            **47.85**   |                      47.31   |               44.22 |             42.03  |
| arc_challenge (acc_norm, 0-shot) |                                47.53   |                      46.16   |               44.88 |             43.43  |
| gsm8k (exact_match, 5-shot)      |                                10.46   |                      13.27   |               10.39 |             05.23  |
| hellaswag (acc_norm, 0-shot)     |                                76.66   |                      75.97   |               76.19 |             76.33  |
| mmlu (acc, 0-shot)               |                                44.26   |                      40.78   |               29.68 |             25.72  |
| truthfulqa_mc2 (acc, 0-shot)     |                                37.29   |                      39.00   |               34.01 |             34.27  |
| winogrande (acc, 0-shot)         |                                70.88   |                      68.67   |               70.17 |             67.17  |

[LM Evaluation Harness Repository](https://github.com/EleutherAI/lm-evaluation-harness)


## Model Architecture and Objective

The model is a 6.9B parameter model based on `LlamaForCausalLM` architecture, with larger vocab size (100k) that matches with the gpt-4 tokenizer.
The training objective was the standard next token prediction.

## Model Card Authors

- [email protected]

## Model Card Contact

Should you have any inquiries, contact [email protected].