dfurman's picture
Update README.md
70c8f6f
|
raw
history blame
3.59 kB
metadata
license: unknown
library_name: peft
tags:
  - mistral
datasets:
  - ehartford/dolphin
  - garage-bAInd/Open-Platypus
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1

mistral-7b-instruct-v0.1

General instruction-following llm finetuned from mistralai/Mistral-7B-v0.1.

Model Details

Model Description

This instruction-following llm was built via parameter-efficient QLoRA finetuning of mistralai/Mistral-7B-v0.1 on the first 5k rows of ehartford/dolphin. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 1 hour on Google Colab. Only the peft adapter weights are included in this model repo, alonside the tokenizer.

  • Developed by: Daniel Furman
  • Model type: Decoder-only
  • Language(s) (NLP): English
  • License: Yi model license
  • Finetuned from model: mistralai/Mistral-7B-v0.1

Model Sources

Evaluation

Metric Value
MMLU (5-shot) Coming
ARC (25-shot) Coming
HellaSwag (10-shot) Coming
TruthfulQA (0-shot) Coming
Avg. Coming

We use Eleuther.AI's Language Model Evaluation Harness to run the benchmark tests above, the same version as Hugging Face's Open LLM Leaderboard.

Training

It took ~1 hour to train 1 epoch on 1x A100.

Prompt format: This model (and all my future releases) use ChatML prompt format.

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
### Training Hyperparameters

We use the [`SFTTrainer`] (https://huggingface.co/docs/trl/main/en/sft_trainer) from 🤗's TRL package to easily fine-tune llms on instruction-following datasets.

The following `TrainingArguments` config was used:

- num_train_epochs = 1
- auto_find_batch_size = True
- gradient_accumulation_steps = 1
- optim = "paged_adamw_32bit"
- save_strategy = "epoch"
- learning_rate = 3e-4
- lr_scheduler_type = "cosine"
- warmup_ratio = 0.03
- logging_strategy = "steps"
- logging_steps = 25
- bf16 = True

The following `bitsandbytes` quantization config was used:

- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: bfloat16

## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

### Speeds, Sizes, Times 

| runtime / 50 tokens (sec) | GPU             | attn | torch dtype | VRAM (GB) |
|:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
| 3.1                        | 1x A100 (40 GB SXM)  | torch               | fp16    | 13                    |


## Model Card Contact

dryanfurman at gmail


## Framework versions

- PEFT 0.6.0.dev0