ibm-granite
/

granite-3.3-8b-rag-agent-lib

Transformers

Safetensors

English

Model card Files Files and versions

xet

Community

cguna commited on Jun 12

Commit

c28f455

verified ·

1 Parent(s): cf41451

Update query_rewrite_lora/README.md

Browse files

Files changed (1) hide show

query_rewrite_lora/README.md +4 -3

query_rewrite_lora/README.md CHANGED Viewed

@@ -43,6 +43,7 @@ As a result of the expansion, the query becomes a standalone query, still equiva
 We provide the query to rewrite in a separate role for clearer delineation.
 The simplest way to invoke the LoRA adapter for query rewrite is through the granite.io framework (https://github.com/ibm-granite/granite-io), where the LoRA adapter is wrapped through a QueryRewriteIOProcessor, which runs on top of VLLM and also abstracts away the lower-level details of calling the adapter. See the following quickstart example code.
 ## Quickstart Example Using [Granite IO](https://github.com/ibm-granite/granite-io)
 ```python
@@ -56,7 +57,7 @@ from granite_io import make_backend
 # Constants go here
 base_model_name = "ibm-granite/granite-3.3-8b-instruct"
-lora_model_name = "ibm-granite/granite-3.3-8b-lora-rag-query-rewrite"
 run_server = True
 if run_server:
@@ -159,7 +160,7 @@ The exact format is:
 **Model output**: When prompted with the above format, the model generates a json object, which contains a field with the actual rewritten question.
-Use the code below to get started with the model.
 ```python
 import torch
@@ -181,7 +182,7 @@ REWRITE_PROMPT = "<|start_of_role|>rewrite: " + INSTRUCTION_TEXT + JSON + "<|end
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
-LORA_NAME = "ibm-granite/granite-3.3-8b-lora-rag-query-rewrite"
 tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
 model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map='auto')

 We provide the query to rewrite in a separate role for clearer delineation.
 The simplest way to invoke the LoRA adapter for query rewrite is through the granite.io framework (https://github.com/ibm-granite/granite-io), where the LoRA adapter is wrapped through a QueryRewriteIOProcessor, which runs on top of VLLM and also abstracts away the lower-level details of calling the adapter. See the following quickstart example code.
+Before running the script, set the `LORA_NAME` parameter to the path of the directory that you downloaded the LoRA adapter. The download process is explained [here](https://huggingface.co/ibm-granite/granite-3.3-8b-rag-agent-lib#quickstart-example).
 ## Quickstart Example Using [Granite IO](https://github.com/ibm-granite/granite-io)
 ```python
 # Constants go here
 base_model_name = "ibm-granite/granite-3.3-8b-instruct"
+lora_model_name = "PATH_TO_DOWNLOADED_DIRECTORY"
 run_server = True
 if run_server:
 **Model output**: When prompted with the above format, the model generates a json object, which contains a field with the actual rewritten question.
+Use the code below to get started with the model. Before running the script, set the `LORA_NAME` parameter to the path of the directory that you downloaded the LoRA adapter. The download process is explained [here](https://huggingface.co/ibm-granite/granite-3.3-8b-rag-agent-lib#quickstart-example).
 ```python
 import torch
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
+LORA_NAME = "PATH_TO_DOWNLOADED_DIRECTORY"
 tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
 model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map='auto')