Update README.md
Browse files
README.md
CHANGED
|
@@ -44,6 +44,7 @@ pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory
|
|
| 44 |
```
|
| 45 |
You can load the model directly from the Hugging Face model hub using
|
| 46 |
```python
|
|
|
|
| 47 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 48 |
|
| 49 |
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
|
|
@@ -51,7 +52,7 @@ model = AutoModelForCausalLM.from_pretrained("togethercomputer/Llama-2-7B-32K-In
|
|
| 51 |
trust_remote_code=True, torch_dtype=torch.float16)
|
| 52 |
input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
|
| 53 |
output = model.generate(input_ids, max_length=128,
|
| 54 |
-
temperature=0.7,
|
| 55 |
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
| 56 |
```
|
| 57 |
|
|
@@ -103,7 +104,9 @@ This poem captures the essence of cats, highlighting their beauty, independence,
|
|
| 103 |
We evaluate the model from three aspects: 1) [Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/);
|
| 104 |
2) [Rouge score over BookSum](https://together.ai/blog/Llama-2-7B-32K); and
|
| 105 |
3) [Accuracy over Multi-document Question Answering (MQA)](https://together.ai/blog/Llama-2-7B-32K).
|
| 106 |
-
We compare with models including
|
|
|
|
|
|
|
| 107 |
[Longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k)
|
| 108 |
and [Longchat-7b-v1.5-32k](https://huggingface.co/lmsys/longchat-7b-v1.5-32k).
|
| 109 |
We summarize the results below:
|
|
@@ -126,6 +129,7 @@ We summarize the results below:
|
|
| 126 |
| Llama-2-7B-Chat-hf | 0.055 | 0.008 | 0.046 |
|
| 127 |
| Longchat-7b-16k | 0.303 | 0.055 | 0.160 |
|
| 128 |
| Longchat-7b-v1.5-32k | 0.308 | 0.057 | 0.163 |
|
|
|
|
| 129 |
| Llama-2-7B-32K-Instruct (ours) | 0.336 | 0.076 | 0.184 |
|
| 130 |
|
| 131 |
* Accuracy over MQA
|
|
@@ -134,10 +138,9 @@ We summarize the results below:
|
|
| 134 |
| Llama-2-7B-Chat-hf | 0.384 | 0.375 | 0.313 |
|
| 135 |
| Longchat-7b-16k | 0.510 | 0.473 | 0.428 |
|
| 136 |
| Longchat-7b-v1.5-32k | 0.534 | 0.516 | 0.479 |
|
|
|
|
| 137 |
| Llama-2-7B-32K-Instruct (ours) | 0.622 | 0.604 | 0.589 |
|
| 138 |
|
| 139 |
-
We observe that our finetuned Llama-2-7B-32K-Instruct consistently outperforms other baseline models including Llama-2-7b-chat, Longchat-7b-16k and Longchat-7b-v1.5-32k.
|
| 140 |
-
|
| 141 |
## Limitations and Bias
|
| 142 |
|
| 143 |
As with all language models, Llama-2-7B-32K-Instruct may generate incorrect or biased content. It's important to keep this in mind when using the model.
|
|
|
|
| 44 |
```
|
| 45 |
You can load the model directly from the Hugging Face model hub using
|
| 46 |
```python
|
| 47 |
+
import torch
|
| 48 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 49 |
|
| 50 |
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
|
|
|
|
| 52 |
trust_remote_code=True, torch_dtype=torch.float16)
|
| 53 |
input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
|
| 54 |
output = model.generate(input_ids, max_length=128,
|
| 55 |
+
temperature=0.7, repetition_penalty=1.1, top_p=0.7, top_k=50)
|
| 56 |
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
| 57 |
```
|
| 58 |
|
|
|
|
| 104 |
We evaluate the model from three aspects: 1) [Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/);
|
| 105 |
2) [Rouge score over BookSum](https://together.ai/blog/Llama-2-7B-32K); and
|
| 106 |
3) [Accuracy over Multi-document Question Answering (MQA)](https://together.ai/blog/Llama-2-7B-32K).
|
| 107 |
+
We compare with models including
|
| 108 |
+
[GPT-3.5-Turbo-16K](https://platform.openai.com/docs/models/gpt-3-5),
|
| 109 |
+
[https://huggingface.co/meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
|
| 110 |
[Longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k)
|
| 111 |
and [Longchat-7b-v1.5-32k](https://huggingface.co/lmsys/longchat-7b-v1.5-32k).
|
| 112 |
We summarize the results below:
|
|
|
|
| 129 |
| Llama-2-7B-Chat-hf | 0.055 | 0.008 | 0.046 |
|
| 130 |
| Longchat-7b-16k | 0.303 | 0.055 | 0.160 |
|
| 131 |
| Longchat-7b-v1.5-32k | 0.308 | 0.057 | 0.163 |
|
| 132 |
+
| GPT-3.5-Turbo-16K | 0.324 | 0.066 | 0.178 |
|
| 133 |
| Llama-2-7B-32K-Instruct (ours) | 0.336 | 0.076 | 0.184 |
|
| 134 |
|
| 135 |
* Accuracy over MQA
|
|
|
|
| 138 |
| Llama-2-7B-Chat-hf | 0.384 | 0.375 | 0.313 |
|
| 139 |
| Longchat-7b-16k | 0.510 | 0.473 | 0.428 |
|
| 140 |
| Longchat-7b-v1.5-32k | 0.534 | 0.516 | 0.479 |
|
| 141 |
+
| GPT-3.5-Turbo-16K | 0.622 | 0.609 | 0.577 |
|
| 142 |
| Llama-2-7B-32K-Instruct (ours) | 0.622 | 0.604 | 0.589 |
|
| 143 |
|
|
|
|
|
|
|
| 144 |
## Limitations and Bias
|
| 145 |
|
| 146 |
As with all language models, Llama-2-7B-32K-Instruct may generate incorrect or biased content. It's important to keep this in mind when using the model.
|