Nick Doiron
commited on
Commit
·
c57ec09
1
Parent(s):
53eff1c
quantize-then-dequantize
Browse files- .gitignore +1 -0
- README.md +4 -14
- pytorch_model-00001-of-00002.bin +2 -2
- pytorch_model-00002-of-00002.bin +2 -2
.gitignore
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
.DS_Store
|
README.md
CHANGED
|
@@ -22,7 +22,7 @@ Essentials:
|
|
| 22 |
- Based on LLaMa2-7b-hf (version 2, 7B params)
|
| 23 |
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
|
| 24 |
- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
|
| 25 |
-
- Merged LLaMa2 and the adapter weights
|
| 26 |
|
| 27 |
## Prompt options
|
| 28 |
|
|
@@ -100,19 +100,9 @@ python3 qlora.py \
|
|
| 100 |
|
| 101 |
What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
|
| 102 |
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
m = AutoModelForCausalLM.from_pretrained(
|
| 107 |
-
model_name,
|
| 108 |
-
#load_in_4bit=True,
|
| 109 |
-
torch_dtype=torch.bfloat16,
|
| 110 |
-
#device_map={"": 0},
|
| 111 |
-
)
|
| 112 |
-
m = PeftModel.from_pretrained(m, adapters_name)
|
| 113 |
-
m = m.merge_and_unload()
|
| 114 |
-
m.save_pretrained("nyc-savvy")
|
| 115 |
-
```
|
| 116 |
|
| 117 |
## Testing that the model is NYC-savvy
|
| 118 |
|
|
|
|
| 22 |
- Based on LLaMa2-7b-hf (version 2, 7B params)
|
| 23 |
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
|
| 24 |
- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
|
| 25 |
+
- Merged [quantized-then-dequantized LLaMa2](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) and the adapter weights to produce this full-sized model
|
| 26 |
|
| 27 |
## Prompt options
|
| 28 |
|
|
|
|
| 100 |
|
| 101 |
What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
|
| 102 |
|
| 103 |
+
Two options for merging:
|
| 104 |
+
- The included `peftmerger.py` script merges the adapter and saves the model.
|
| 105 |
+
- Chris Hayduk produced a script to [quantize then de-quantize](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) the base model before merging a QLoRA adapter. This requires bitsandbytes and a GPU.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
## Testing that the model is NYC-savvy
|
| 108 |
|
pytorch_model-00001-of-00002.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6875060db94711a55e3aefe325355c28b260fb3bd5795add8707cfe8fe8340b8
|
| 3 |
+
size 9976623130
|
pytorch_model-00002-of-00002.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:35f9ab7de991127d8aee80f8f6fea00e73385f303121ac995c1afd51fd2551ba
|
| 3 |
+
size 3500311811
|