|
--- |
|
language: |
|
- en |
|
license: mit |
|
library_name: transformers |
|
tags: |
|
- pretrained |
|
- 7B |
|
- English |
|
- text-generation |
|
- base-model |
|
- bittensor |
|
- decentralized AI |
|
- Web3 |
|
datasets: |
|
- tiiuae/falcon-refinedweb |
|
--- |
|
|
|
|
|
# 🏯 Sumo-Qyuu-7B-v0.1 |
|
|
|
 |
|
|
|
🏯 Sumo is a family of models developed by [Tensorplex](https://tensorplex.ai). Specifically, "Sumo-Qyuu" stands for the best model developed for the bittensor subnet 9. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** [Tensorplex Labs](https://tensorplex.ai) |
|
- **Model type:** Pretrained Foundational Language Model |
|
- **Language(s) (NLP):** Primarily English |
|
- **License:** MIT |
|
|
|
### Model Sources |
|
|
|
- **Bittensor Subnet9 Leaderboard:** [https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard](https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard) |
|
- **Bittensor Subnet9 Repository:** [https://github.com/RaoFoundation/pretraining/tree/main](https://github.com/RaoFoundation/pretraining/tree/main) |
|
|
|
## Usage |
|
|
|
⛔ **This is a pretrained base model, which hasn't been aligned yet. Use with caution or finetune further on downstream tasks before deployment.** |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import transformers |
|
import torch |
|
|
|
model = "tensorplex-labs/Sumo-Qyuu-7B-v0.1" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model, |
|
tokenizer=tokenizer, |
|
torch_dtype=torch.bfloat16, |
|
) |
|
sequences = pipeline( |
|
"What is Yokozuna?", |
|
max_length=256, |
|
do_sample=True, |
|
temperature=0.6, |
|
top_p=0.9, |
|
num_return_sequences=1, |
|
eos_token_id=tokenizer.eos_token_id, |
|
bos_token_id=tokenizer.bos_token_id, |
|
) |
|
for seq in sequences: |
|
print(f"Result: {seq['generated_text']}") |
|
|
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
This model has been trained with [tiiuae/falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset. |
|
|
|
## Evaluation |
|
|
|
| | tensorplex-labs/Sumo-Qyuu-7B-v0.1 | NousResearch/Llama-2-7b-hf | yahma/llama-7b-hf | tiiuae/falcon-7b | |
|
|----------------------------------|----------------------------------------|------------------------------|---------------------|--------------------| |
|
| **avg** | **47.85** | 47.31 | 44.22 | 42.03 | |
|
| arc_challenge (acc_norm, 0-shot) | 47.53 | 46.16 | 44.88 | 43.43 | |
|
| gsm8k (exact_match, 5-shot) | 10.46 | 13.27 | 10.39 | 05.23 | |
|
| hellaswag (acc_norm, 0-shot) | 76.66 | 75.97 | 76.19 | 76.33 | |
|
| mmlu (acc, 0-shot) | 44.26 | 40.78 | 29.68 | 25.72 | |
|
| truthfulqa_mc2 (acc, 0-shot) | 37.29 | 39.00 | 34.01 | 34.27 | |
|
| winogrande (acc, 0-shot) | 70.88 | 68.67 | 70.17 | 67.17 | |
|
|
|
[LM Evaluation Harness Repository](https://github.com/EleutherAI/lm-evaluation-harness) |
|
|
|
|
|
## Model Architecture and Objective |
|
|
|
The model is a 6.9B parameter model based on `LlamaForCausalLM` architecture, with larger vocab size (100k) that matches with the gpt-4 tokenizer. |
|
The training objective was the standard next token prediction. |
|
|
|
## Model Card Authors |
|
|
|
- [email protected] |
|
|
|
## Model Card Contact |
|
|
|
Should you have any inquiries, contact [email protected]. |
|
|
|
|
|
|