metadata

language:
  - en
license: mit
library_name: transformers
tags:
  - pretrained
  - 7B
  - English
  - text-generation
  - base-model
  - bittensor
  - decentralized AI
  - Web3
datasets:
  - tiiuae/falcon-refinedweb

🏯 Sumo-Qyuu-7B-v0.1

🏯 Sumo is a family of models developed by Tensorplex. Specifically, "Sumo-Qyuu" stands for the best model developed for the bittensor subnet 9.

Model Details

Model Description

Developed by: Tensorplex Labs
Model type: Pretrained Foundational Language Model
Language(s) (NLP): Primarily English
License: MIT

Model Sources

Bittensor Subnet9 Leaderboard: https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard
Bittensor Subnet9 Repository: https://github.com/RaoFoundation/pretraining/tree/main

Usage

⛔ This is a pretrained base model, which hasn't been aligned yet. Use with caution or finetune further on downstream tasks before deployment.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tensorplex-labs/Sumo-Qyuu-7B-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
)
sequences = pipeline(
   "What is Yokozuna?",
    max_length=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Training Details

Training Data

This model has been trained with tiiuae/falcon-refinedweb dataset.

Evaluation

	tensorplex-labs/Sumo-Qyuu-7B-v0.1	NousResearch/Llama-2-7b-hf	yahma/llama-7b-hf	tiiuae/falcon-7b
avg	47.85	47.31	44.22	42.03
arc_challenge (acc_norm, 0-shot)	47.53	46.16	44.88	43.43
gsm8k (exact_match, 5-shot)	10.46	13.27	10.39	05.23
hellaswag (acc_norm, 0-shot)	76.66	75.97	76.19	76.33
mmlu (acc, 0-shot)	44.26	40.78	29.68	25.72
truthfulqa_mc2 (acc, 0-shot)	37.29	39.00	34.01	34.27
winogrande (acc, 0-shot)	70.88	68.67	70.17	67.17

LM Evaluation Harness Repository

Model Architecture and Objective

The model is a 6.9B parameter model based on LlamaForCausalLM architecture, with larger vocab size (100k) that matches with the gpt-4 tokenizer. The training objective was the standard next token prediction.

Model Card Authors

[email protected]

Model Card Contact

Should you have any inquiries, contact [email protected].