PowerLM-3b / README.md

mayank-mishra

Update README.md

e7ee55e verified over 1 year ago

preview code

raw

history blame

4.53 kB

metadata

pipeline_tag: text-generation
inference: false
license: apache-2.0
library_name: transformers
model-index:
  - name: ibm/PowerLM-3b
    results:
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: ARC
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 57.2
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: BoolQ
        metrics:
          - name: accuracy
            type: accuracy
            value: 75
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: Hellaswag
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 74.2
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: OpenBookQA
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 41.2
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: PIQA
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 79.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: Winogrande
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 66.3
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: MMLU
        metrics:
          - name: accuracy
            type: accuracy
            value: 44.3
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: GSM8k (5 shot)
        metrics:
          - name: accuracy
            type: accuracy
            value: 35.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: math (4 shot)
        metrics:
          - name: accuracy
            type: accuracy
            value: 14
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode-eval
          name: humaneval
        metrics:
          - name: pass@1
            type: pass@1
            value: 21.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode-eval
          name: MBPP
        metrics:
          - name: pass@1
            type: pass@1
            value: 28
            verified: false

Granite-8B-Code-Instruct-128K

Model Summary

Granite-8B-Code-Instruct-128K is a 8B parameter long-context instruct model fine tuned from Granite-8B-Code-Base-128K on a combination of permissively licensed data used in training the original Granite code instruct models, in addition to synthetically generated code instruction datasets tailored for solving long context problems. By exposing the model to both short and long context data, we aim to enhance its long-context capability without sacrificing code generation performance at short input context.

Developers: IBM Research
GitHub Repository: ibm-granite/granite-code-models
Paper: Scaling Granite Code Models to 128K Context
Release Date: July 18th, 2024
License: Apache 2.0.

Usage

Intended use

Generation

This is a simple example of how to use PowerLM-3b model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # or "cpu"
model_path = "ibm-granite/granite-8B-Code-instruct-128k"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
    { "role": "user", "content": "Write a code to find the maximum value in a list of numbers." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt")
# transfer tokenized inputs to the device
for i in input_tokens:
    input_tokens[i] = input_tokens[i].to(device)
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
    print(i)