metadata
language:
- en
license: mit
library_name: transformers
tags:
- pretrained
- 7B
- English
- text-generation
- base-model
- bittensor
- decentralized AI
- Web3
datasets:
- tiiuae/falcon-refinedweb
🏯 Sumo-Qyuu-7B-v0.1
🏯 Sumo is a family of models developed by Tensorplex. Specifically, "Sumo-Qyuu" stands for the best model developed for the bittensor subnet 9.
Model Details
Model Description
- Developed by: Tensorplex Labs
- Model type: Pretrained Foundational Language Model
- Language(s) (NLP): Primarily English
- License: MIT
Model Sources
- Bittensor Subnet9 Leaderboard: https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard
- Bittensor Subnet9 Repository: https://github.com/RaoFoundation/pretraining/tree/main
Usage
⛔ This is a pretrained base model, which hasn't been aligned yet. Use with caution or finetune further on downstream tasks before deployment.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tensorplex-labs/Sumo-Qyuu-7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
)
sequences = pipeline(
"What is Yokozuna?",
max_length=256,
do_sample=True,
temperature=0.6,
top_p=0.9,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
Training Details
Training Data
This model has been trained with tiiuae/falcon-refinedweb dataset.
Evaluation
tensorplex-labs/Sumo-Qyuu-7B-v0.1 | NousResearch/Llama-2-7b-hf | yahma/llama-7b-hf | tiiuae/falcon-7b | |
---|---|---|---|---|
avg | 47.85 | 47.31 | 44.22 | 42.03 |
arc_challenge (acc_norm, 0-shot) | 47.53 | 46.16 | 44.88 | 43.43 |
gsm8k (exact_match, 5-shot) | 10.46 | 13.27 | 10.39 | 05.23 |
hellaswag (acc_norm, 0-shot) | 76.66 | 75.97 | 76.19 | 76.33 |
mmlu (acc, 0-shot) | 44.26 | 40.78 | 29.68 | 25.72 |
truthfulqa_mc2 (acc, 0-shot) | 37.29 | 39.00 | 34.01 | 34.27 |
winogrande (acc, 0-shot) | 70.88 | 68.67 | 70.17 | 67.17 |
LM Evaluation Harness Repository
Model Architecture and Objective
The model is a 6.9B parameter model based on LlamaForCausalLM
architecture, with larger vocab size (100k) that matches with the gpt-4 tokenizer.
The training objective was the standard next token prediction.
Model Card Authors
Model Card Contact
Should you have any inquiries, contact [email protected].