quantize-then-dequantize

Files changed (4) hide show

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .DS_Store

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ Essentials:
 - Based on LLaMa2-7b-hf (version 2, 7B params)
 - Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
 - Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
-- Merged LLaMa2 and the adapter weights for this full-sized model
 ## Prompt options
@@ -100,19 +100,9 @@ python3 qlora.py \
 What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
-The `peftmerger.py` script applies the adapter and saves the model like this:
-```python
-m = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    #load_in_4bit=True,
-    torch_dtype=torch.bfloat16,
-    #device_map={"": 0},
-)
-m = PeftModel.from_pretrained(m, adapters_name)
-m = m.merge_and_unload()
-m.save_pretrained("nyc-savvy")
-```
 ## Testing that the model is NYC-savvy

 - Based on LLaMa2-7b-hf (version 2, 7B params)
 - Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
 - Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
+- Merged [quantized-then-dequantized LLaMa2](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) and the adapter weights to produce this full-sized model
 ## Prompt options
 What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
+Two options for merging:
+- The included `peftmerger.py` script merges the adapter and saves the model.
+- Chris Hayduk produced a script to [quantize then de-quantize](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) the base model before merging a QLoRA adapter. This requires bitsandbytes and a GPU.
 ## Testing that the model is NYC-savvy

pytorch_model-00001-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:54668da8619fcca38e9bf7b133ce1d445f127a6ac02c7b61ccb6c9803833f14a
-size 9976620122

 version https://git-lfs.github.com/spec/v1
+oid sha256:6875060db94711a55e3aefe325355c28b260fb3bd5795add8707cfe8fe8340b8
+size 9976623130

pytorch_model-00002-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0c64d556b8cf743c04926f35c9673ca38a7b4dee2fa3fc3f972fbe04b3a1ee1d
-size 3500310787

 version https://git-lfs.github.com/spec/v1
+oid sha256:35f9ab7de991127d8aee80f8f6fea00e73385f303121ac995c1afd51fd2551ba
+size 3500311811