Update query_rewrite_lora/README.md
Browse files
query_rewrite_lora/README.md
CHANGED
@@ -43,6 +43,7 @@ As a result of the expansion, the query becomes a standalone query, still equiva
|
|
43 |
We provide the query to rewrite in a separate role for clearer delineation.
|
44 |
|
45 |
The simplest way to invoke the LoRA adapter for query rewrite is through the granite.io framework (https://github.com/ibm-granite/granite-io), where the LoRA adapter is wrapped through a QueryRewriteIOProcessor, which runs on top of VLLM and also abstracts away the lower-level details of calling the adapter. See the following quickstart example code.
|
|
|
46 |
|
47 |
## Quickstart Example Using [Granite IO](https://github.com/ibm-granite/granite-io)
|
48 |
```python
|
@@ -56,7 +57,7 @@ from granite_io import make_backend
|
|
56 |
|
57 |
# Constants go here
|
58 |
base_model_name = "ibm-granite/granite-3.3-8b-instruct"
|
59 |
-
lora_model_name = "
|
60 |
run_server = True
|
61 |
|
62 |
if run_server:
|
@@ -159,7 +160,7 @@ The exact format is:
|
|
159 |
|
160 |
**Model output**: When prompted with the above format, the model generates a json object, which contains a field with the actual rewritten question.
|
161 |
|
162 |
-
Use the code below to get started with the model.
|
163 |
|
164 |
```python
|
165 |
import torch
|
@@ -181,7 +182,7 @@ REWRITE_PROMPT = "<|start_of_role|>rewrite: " + INSTRUCTION_TEXT + JSON + "<|end
|
|
181 |
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
182 |
|
183 |
BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
|
184 |
-
LORA_NAME = "
|
185 |
|
186 |
tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
|
187 |
model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map='auto')
|
|
|
43 |
We provide the query to rewrite in a separate role for clearer delineation.
|
44 |
|
45 |
The simplest way to invoke the LoRA adapter for query rewrite is through the granite.io framework (https://github.com/ibm-granite/granite-io), where the LoRA adapter is wrapped through a QueryRewriteIOProcessor, which runs on top of VLLM and also abstracts away the lower-level details of calling the adapter. See the following quickstart example code.
|
46 |
+
Before running the script, set the `LORA_NAME` parameter to the path of the directory that you downloaded the LoRA adapter. The download process is explained [here](https://huggingface.co/ibm-granite/granite-3.3-8b-rag-agent-lib#quickstart-example).
|
47 |
|
48 |
## Quickstart Example Using [Granite IO](https://github.com/ibm-granite/granite-io)
|
49 |
```python
|
|
|
57 |
|
58 |
# Constants go here
|
59 |
base_model_name = "ibm-granite/granite-3.3-8b-instruct"
|
60 |
+
lora_model_name = "PATH_TO_DOWNLOADED_DIRECTORY"
|
61 |
run_server = True
|
62 |
|
63 |
if run_server:
|
|
|
160 |
|
161 |
**Model output**: When prompted with the above format, the model generates a json object, which contains a field with the actual rewritten question.
|
162 |
|
163 |
+
Use the code below to get started with the model. Before running the script, set the `LORA_NAME` parameter to the path of the directory that you downloaded the LoRA adapter. The download process is explained [here](https://huggingface.co/ibm-granite/granite-3.3-8b-rag-agent-lib#quickstart-example).
|
164 |
|
165 |
```python
|
166 |
import torch
|
|
|
182 |
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
183 |
|
184 |
BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
|
185 |
+
LORA_NAME = "PATH_TO_DOWNLOADED_DIRECTORY"
|
186 |
|
187 |
tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
|
188 |
model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map='auto')
|