File size: 4,006 Bytes
86bb7ae 94fe616 86bb7ae 3b8c381 94fe616 3b8c381 e73a037 3b8c381 c72b0b1 e73a037 3b8c381 28efced 3b8c381 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
---
language:
- en
license: mit
library_name: transformers
tags:
- pretrained
- 7B
- English
- text-generation
- base-model
- bittensor
- decentralized AI
- Web3
datasets:
- tiiuae/falcon-refinedweb
---
# 🏯 Sumo-Qyuu-7B-v0.1

🏯 Sumo is a family of models developed by [Tensorplex](https://tensorplex.ai). Specifically, "Sumo-Qyuu" stands for the best model developed for the bittensor subnet 9.
## Model Details
### Model Description
- **Developed by:** [Tensorplex Labs](https://tensorplex.ai)
- **Model type:** Pretrained Foundational Language Model
- **Language(s) (NLP):** Primarily English
- **License:** MIT
### Model Sources
- **Bittensor Subnet9 Leaderboard:** [https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard](https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard)
- **Bittensor Subnet9 Repository:** [https://github.com/RaoFoundation/pretraining/tree/main](https://github.com/RaoFoundation/pretraining/tree/main)
## Usage
⛔ **This is a pretrained base model, which hasn't been aligned yet. Use with caution or finetune further on downstream tasks before deployment.**
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tensorplex-labs/Sumo-Qyuu-7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
)
sequences = pipeline(
"What is Yokozuna?",
max_length=256,
do_sample=True,
temperature=0.6,
top_p=0.9,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
```
## Training Details
### Training Data
This model has been trained with [tiiuae/falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.
## Evaluation
| | tensorplex-labs/Sumo-Qyuu-7B-v0.1 | NousResearch/Llama-2-7b-hf | yahma/llama-7b-hf | tiiuae/falcon-7b |
|----------------------------------|----------------------------------------|------------------------------|---------------------|--------------------|
| **avg** | **47.85** | 47.31 | 44.22 | 42.03 |
| arc_challenge (acc_norm, 0-shot) | 47.53 | 46.16 | 44.88 | 43.43 |
| gsm8k (exact_match, 5-shot) | 10.46 | 13.27 | 10.39 | 05.23 |
| hellaswag (acc_norm, 0-shot) | 76.66 | 75.97 | 76.19 | 76.33 |
| mmlu (acc, 0-shot) | 44.26 | 40.78 | 29.68 | 25.72 |
| truthfulqa_mc2 (acc, 0-shot) | 37.29 | 39.00 | 34.01 | 34.27 |
| winogrande (acc, 0-shot) | 70.88 | 68.67 | 70.17 | 67.17 |
[LM Evaluation Harness Repository](https://github.com/EleutherAI/lm-evaluation-harness)
## Model Architecture and Objective
The model is a 6.9B parameter model based on `LlamaForCausalLM` architecture, with larger vocab size (100k) that matches with the gpt-4 tokenizer.
The training objective was the standard next token prediction.
## Model Card Authors
- [email protected]
## Model Card Contact
Should you have any inquiries, contact [email protected].
|