Sumo-T9-7B-v0.1 / README.md
tensorplex-labs's picture
Update README.md
28efced verified
|
raw
history blame
4.01 kB
metadata
language:
  - en
license: mit
library_name: transformers
tags:
  - pretrained
  - 7B
  - English
  - text-generation
  - base-model
  - bittensor
  - decentralized AI
  - Web3
datasets:
  - tiiuae/falcon-refinedweb

🏯 Sumo-Qyuu-7B-v0.1

image/png

🏯 Sumo is a family of models developed by Tensorplex. Specifically, "Sumo-Qyuu" stands for the best model developed for the bittensor subnet 9.

Model Details

Model Description

  • Developed by: Tensorplex Labs
  • Model type: Pretrained Foundational Language Model
  • Language(s) (NLP): Primarily English
  • License: MIT

Model Sources

Usage

This is a pretrained base model, which hasn't been aligned yet. Use with caution or finetune further on downstream tasks before deployment.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tensorplex-labs/Sumo-Qyuu-7B-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
)
sequences = pipeline(
   "What is Yokozuna?",
    max_length=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Training Details

Training Data

This model has been trained with tiiuae/falcon-refinedweb dataset.

Evaluation

tensorplex-labs/Sumo-Qyuu-7B-v0.1 NousResearch/Llama-2-7b-hf yahma/llama-7b-hf tiiuae/falcon-7b
avg 47.85 47.31 44.22 42.03
arc_challenge (acc_norm, 0-shot) 47.53 46.16 44.88 43.43
gsm8k (exact_match, 5-shot) 10.46 13.27 10.39 05.23
hellaswag (acc_norm, 0-shot) 76.66 75.97 76.19 76.33
mmlu (acc, 0-shot) 44.26 40.78 29.68 25.72
truthfulqa_mc2 (acc, 0-shot) 37.29 39.00 34.01 34.27
winogrande (acc, 0-shot) 70.88 68.67 70.17 67.17

LM Evaluation Harness Repository

Model Architecture and Objective

The model is a 6.9B parameter model based on LlamaForCausalLM architecture, with larger vocab size (100k) that matches with the gpt-4 tokenizer. The training objective was the standard next token prediction.

Model Card Authors

Model Card Contact

Should you have any inquiries, contact [email protected].