File size: 6,548 Bytes
e5c924e
6886580
a585f92
6886580
 
 
 
 
 
 
 
e5c924e
6886580
04207f5
ac01a3e
6886580
ac01a3e
6886580
ac01a3e
 
 
 
70c8f6f
ac01a3e
 
 
 
 
 
 
 
 
 
 
70c8f6f
6886580
 
 
 
 
 
 
 
 
14be679
6886580
70c8f6f
6886580
70c8f6f
6886580
70c8f6f
 
935acbd
70c8f6f
 
 
 
 
 
ac01a3e
935acbd
6886580
70c8f6f
6886580
ac01a3e
6886580
ac01a3e
 
 
 
 
 
 
 
 
 
 
6886580
ac01a3e
6886580
 
 
 
 
 
 
 
 
 
 
a585f92
70c8f6f
 
 
 
935acbd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70c8f6f
ac01a3e
 
 
 
 
 
 
 
 
 
 
 
6886580
a585f92
 
6886580
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---
license: unknown
library_name: peft
tags:
- mistral
datasets:
- ehartford/dolphin
- garage-bAInd/Open-Platypus
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
---


# mistral-7b-instruct-v0.1

General instruction-following llm finetuned from [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1).

## Model Details

### Model Description

This instruction-following llm was built via parameter-efficient QLoRA finetuning of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the first 5k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin). Finetuning was executed on 1x A100 (40 GB SXM) for roughly 1 hour on Google Colab. **Only** the `peft` adapter weights are included in this model repo, alonside the tokenizer.

- **Developed by:** Daniel Furman
- **Model type:** Decoder-only
- **Language(s) (NLP):** English
- **License:** Yi model license
- **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)

### Model Sources 

- **Repository:** [github.com/daniel-furman/sft-demos](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/mistral/sft-mistral-7b-instruct-peft.ipynb)

### Evaluation

| Metric                | Value |
|-----------------------|-------|
| MMLU (5-shot)         | Coming |
| ARC (25-shot)         | Coming |
| HellaSwag (10-shot)   | Coming |
| TruthfulQA (0-shot)   | Coming |
| Avg.                  | Coming |

We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

## Training

It took ~1 hour to train 1 epoch on 1x A100.

Prompt format:
This model (and all my future releases) use [ChatML](https://huggingface.co/docs/transformers/chat_templating#what-template-should-i-use) prompt format.

```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
### Training Hyperparameters
```

We use the [`SFTTrainer`] (https://huggingface.co/docs/trl/main/en/sft_trainer) from 🤗's TRL package to easily fine-tune llms on instruction-following datasets.

The following `TrainingArguments` config was used:

- num_train_epochs = 1
- auto_find_batch_size = True
- gradient_accumulation_steps = 1
- optim = "paged_adamw_32bit"
- save_strategy = "epoch"
- learning_rate = 3e-4
- lr_scheduler_type = "cosine"
- warmup_ratio = 0.03
- logging_strategy = "steps"
- logging_steps = 25
- bf16 = True

The following `bitsandbytes` quantization config was used:

- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: bfloat16

## How to Get Started with the Model

Use the code below to get started with the model.

```python
!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece

import torch
from peft import PeftModel, PeftConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
```

```python
peft_model_id = "dfurman/mistral-7b-instruct-v0.1"
config = PeftConfig.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path,
    use_fast=True, 
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, peft_model_id)

format_template = "You are a helpful assistant. Write a response that appropriately completes the request. {query}\n"
```

```python
query = "Write a short email inviting my friends to a dinner party on Friday. Respond succinctly."
prompt = format_template.format(query=query)

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.bfloat16):
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.1,
        return_dict_in_generate=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
        no_repeat_ngram_size=5,
    )

print("\n\n*** Generate:")
print(tokenizer.decode(output["sequences"][0][len(input_ids[0]):], skip_special_tokens=True))
```

<details>

<summary>Output</summary>

**Prompt**: Write a short email inviting my friends to a dinner party on Friday. Respond succinctly.

**Generation**: The invitation should be brief and to-the-point, so it's best to use simple language and avoid unnecessary details or long explanations. Here is an example of a concise invitation:

Dear Friends,

I hope you can join me for a fun evening at my place this Friday! We'll have delicious food, great conversation, and maybe even some games if we feel like it. Please RSVP by Wednesday night so I know who will be there. 

Looking forward to seeing you all soon!

Best regards,
Your Name

This message clearly communicates the essential information about the event while maintaining a friendly tone. It also includes a specific date (Friday) and timeframe (evening), as well as a clear call to action (RSVP). The closing line adds a personal touch and expresses excitement for the gathering. Overall, this invitation strikes a good balance between being informative and engaging without overwhelming the reader with too much text.

Remember, when writing emails, always keep in mind your audience and their preferences. If they prefer more detailed information or additional context, adjust accordingly. However, try not to make the invitation overly complicated or lengthy – simplicity often makes for a better experience. Happy emailing!

</details>

### Speeds, Sizes, Times 

| runtime / 50 tokens (sec) | GPU             | attn | torch dtype | VRAM (GB) |
|:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
| 3.1                        | 1x A100 (40 GB SXM)  | torch               | fp16    | 13                    |


## Model Card Contact

dryanfurman at gmail


## Framework versions

- PEFT 0.6.0.dev0