kgreenewald commited on
Commit
e82cd2d
·
verified ·
1 Parent(s): d32463c

Upload 10 files

Browse files
citation_generation_lora/README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ base_model: ibm-granite/granite-3.3-8b-instruct
7
+ library_name: peft
8
+ library_name: transformers
9
+ ---
10
+
11
+ # LoRA Adapter for Citation Generation
12
+ Welcome to Granite Experiments!
13
+
14
+ Think of Experiments as a preview of what's to come. These projects are still under development, but we wanted to let the open-source community take them for spin! Use them, break them, and help us build what's next for Granite – we'll keep an eye out for feedback and questions. Happy exploring!
15
+
16
+ Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing support or guarantee performance.
17
+
18
+ # Model Summary
19
+
20
+ This is a RAG-specific LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) that is fine-tuned for the citation generation task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the adapter generates citations for the last assistant response from the provided documents/passages. The LoRA adapter has the following features:
21
+ 1. **Fine-grained citations:** The adapter generates citations for each sentence in the assistant response (when available). Moreover, each citation consists of a set of sentences from the documents/passages that support the corresponding sentence in the assistant response.
22
+ 2. **Post-hoc citation generation:** Since the adapter takes the assistant response as input, it can generate citations for responses generated by any LLM. Pick your favorite LLM and use the adapter to generate post-hoc citations!
23
+
24
+ </br>
25
+
26
+ - **Developer:** IBM Research
27
+ - **Model type:** LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
28
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
29
+
30
+ ## Intended use
31
+ This is a LoRA adapter that gives the ability to generate citations for the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. It can be used to generate post-hoc citations for assistant responses generated by any LLM in a RAG setting.
32
+
33
+ > [!TIP]
34
+ > Note: While you can invoke the adapter directly, as outlined below, we highly recommend calling it through [granite-io](https://github.com/ibm-granite/granite-io), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the adapter as well as validating the adapter's output and transforming the returned sentence IDs into spans over the documents and the response).
35
+
36
+ **Model input**: The input to the model is conceptually a list of conversational turns ending with an assistant response and a list of documents converted to a string using the `apply_chat_template` function. For the adapter to work, the last assistant response as well as the documents should be pre-split into sentences. In more detail, the primary inputs are the following three items, each represented in JSON:
37
+
38
+ - **conversation**: A list of conversational turns between the user and the assistant, where each item in the list is a dictionary with fields `role` and `content`. The `role` equals to either `user` or `assistant`, denoting user and assistant turns, respectively, while the `content` field contains the corresponding user/assistant utterance. The conversation should end with an assistant turn and the `text` field of that turn should contain the assistant utterance with each sentence prefixed with a response sentence ID of the form `<rI>`, where `I` is an integer. The numbering should start from 0 (for the first sentence) and be incremented by one for each subsequent sentence in the last assistant turn. Note that only the last assistant turn should be split into sentences as described above; earlier assistant turns (as well as all user turns) should be maintained in their original form.
39
+ - **instruction**: A task instruction, which is encoded as a dictionary with fields `role` and `content`, where `role` equals to `system` and `content` equals to the following string describing the citation generation task: `Split the last assistant response into individual sentences. For each sentence in the response, identify the statement IDs from the documents that it references. Ensure that your output includes all response sentence IDs, and for each response sentence ID, provide the corresponding referring document sentence IDs.`
40
+ - **documents**: A list of documents, where each item in the list is a dictionary with fields `doc_id` and `text`. The `text` field contains the text of the corresponding document with each sentence prefixed with a context sentence ID of the form `<cI>`, where `I` is an integer. The context sentence ID numbers should start from 0 (for the first sentence of the first document) and be incremented by one for each subsequent sentence. The numbers should continue to be incremented across documents to ensure that each context sentence ID appears once across the entire list of documents. For instance, if the last sentence of the 1st document has context sentence ID `<cn>`, then the first sentence of the 2nd document is expected to have ID `<cn+1>`.
41
+
42
+ To prompt the LoRA adapter, we combine the above components as follows: We first append the **instruction** to the end of the **conversation** to generate an **augmented_conversation** list. Then we invoke the `apply_chat_template` function with parameters: conversation = **augmented_conversation** and documents = **documents**.
43
+
44
+ **Model output**: When prompted with the above input, the model generates the citations for each sentence of the last assistant response in the form of a JSON array. The array is of the form `[{"r": 0, "c": [...]}, {"r": 1, "c": [...]}, ...}]`, where a JSON object of the form `{"r": k, "c": [l, m]}`, where `k, l, m` integers, denotes that the response sentence with ID `<rk>` is supported by the context sentences with IDs `<cl>` and `<cm>`.
45
+
46
+
47
+ ## Quickstart Example
48
+
49
+ As explained above, it is highly recommended to use the LoRA adapter through [granite-io](https://github.com/ibm-granite/granite-io).
50
+
51
+ However, if you prefer to invoke the LoRA adapter directly, you can use the following code. Note that the code assumes that the documents and the last assistant response have been already split into sentences.
52
+
53
+ ```
54
+ import torch
55
+ from transformers import AutoTokenizer, AutoModelForCausalLM
56
+ from peft import PeftModel, PeftConfig
57
+ import json
58
+
59
+ BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
60
+ LORA_NAME = "ibm-granite/granite-3.3-8b-lora-rag-citation-generation"
61
+ device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
62
+
63
+ tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
64
+ model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map="auto")
65
+ model_citation = PeftModel.from_pretrained(model_base, LORA_NAME)
66
+
67
+ conversation = [
68
+ {"role": "user", "content": "What is the visibility level of Git Repos and Issue Tracking projects?"},
69
+ {"role": "assistant", "content": "<r0> Git Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. <r1> Private projects are visible only to project members, internal projects are visible to all users that are logged in to IBM Cloud, and public projects are visible to anyone. <r2> By default, new projects are set to private visibility level, which is the most secure for your data."}]
70
+
71
+ documents = [
72
+ {"doc_id": 0, "text": "<c0> Git Repos and Issue Tracking is an IBM-hosted component of the Continuous Delivery service. <c1> All of the data that you provide to Git Repos and Issue Tracking, including but not limited to source files, issues, pull requests, and project configuration properties, is managed securely within Continuous Delivery. <c2> However, Git Repos and Issue Tracking supports various mechanisms for exporting, sending, or otherwise sharing data to users and third parties. <c3> The ability of Git Repos and Issue Tracking to share information is typical of many social coding platforms. <c4> However, such sharing might conflict with regulatory controls that apply to your business. <c5> After you create a project in Git Repos and Issue Tracking, but before you entrust any files, issues, records, or other data with the project, review the project settings and change any settings that you deem necessary to protect your data. <c6> Settings to review include visibility levels, email notifications, integrations, web hooks, access tokens, deploy tokens, and deploy keys. <c7> Project visibility levels \n\nGit Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. <c8> * Private projects are visible only to project members. <c9> This setting is the default visibility level for new projects, and is the most secure visibility level for your data. <c10> * Internal projects are visible to all users that are logged in to IBM Cloud. <c11> * Public projects are visible to anyone. <c12> To limit project access to only project members, complete the following steps:\n\n\n\n1. <c13> From the project sidebar, click Settings > General. <c14> 2. <c15> On the General Settings page, click Visibility > project features > permissions. <c16> 3. <c17> Locate the Project visibility setting. <c18> 4. <c19> Select Private, if it is not already selected. <c20> 5. <c21> Click Save changes. <c22> Project membership \n\nGit Repos and Issue Tracking is a cloud hosted social coding environment that is available to all Continuous Delivery users. <c23> If you are a Git Repos and Issue Tracking project Maintainer or Owner, you can invite any user and group members to the project. <c24> IBM Cloud places no restrictions on who you can invite to a project."},
73
+ {"doc_id": 1, "text": "<c25> After you create a project in Git Repos and Issue Tracking, but before you entrust any files, issues, records, or other data with the project, review the project settings and change any settings that are necessary to protect your data. <c26> Settings to review include visibility levels, email notifications, integrations, web hooks, access tokens, deploy tokens, and deploy keys. <c27> Project visibility levels \n\nGit Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. <c28> * Private projects are visible only to project members. <c29> This setting is the default visibility level for new projects, and is the most secure visibility level for your data. <c30> * Internal projects are visible to all users that are logged in to IBM Cloud. <c31> * Public projects are visible to anyone. <c32> To limit project access to only project members, complete the following steps:\n\n\n\n1. <c33> From the project sidebar, click Settings > General. <c34> 2. <c35> On the General Settings page, click Visibility > project features > permissions. <c36> 3. <c37> Locate the Project visibility setting. <c38> 4. <c39> Select Private, if it is not already selected. <c40> 5. <c41> Click Save changes. <c42> Project email settings \n\nBy default, Git Repos and Issue Tracking notifies project members by way of email about project activities. <c43> These emails typically include customer-owned data that was provided to Git Repos and Issue Tracking by users. <c44> For example, if a user posts a comment to an issue, Git Repos and Issue Tracking sends an email to all subscribers. <c45> The email includes information such as a copy of the comment, the user who posted it, and when the comment was posted. <c46> To turn off all email notifications for your project, complete the following steps:\n\n\n\n1. <c47> From the project sidebar, click Settings > General. <c48> 2. <c49> On the **General Settings **page, click Visibility > project features > permissions. <c50> 3. <c51> Select the Disable email notifications checkbox. <c52> 4. <c53> Click Save changes. <c54> Project integrations and webhooks"}]
74
+
75
+ # Add system prompt
76
+ citation_sys_prompt = "Split the last assistant response into individual sentences. For each sentence in the response, identify the statement IDs from the documents that it references. Ensure that your output includes all response sentence IDs, and for each response sentence ID, provide the list of corresponding referring document sentence IDs. The output must be a json structure."
77
+ conversation.append({"role": "system", "content": citation_sys_prompt})
78
+
79
+ # Generate answer
80
+ input_text = tokenizer.apply_chat_template(conversation=conversation, documents=documents, tokenize=False)
81
+ inputs = tokenizer(input_text, return_tensors="pt")
82
+ output = model_citation.generate(inputs["input_ids"].to(device), attention_mask=inputs["attention_mask"].to(device), max_new_tokens=500)
83
+ output_text = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
84
+ print("Output: ")
85
+ print(json.loads(output_text))
86
+ ```
87
+
88
+
89
+ ## Training Details
90
+
91
+ The LoRA adapter was trained on synthetically-generated citation datasets. The process of generating the training data consisted of two main steps:
92
+ - **Multi-turn RAG conversation generation:** Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpora. For details on the RAG conversation generation process please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500).
93
+ - **Citation generation:** For each turn of the multi-turn RAG conversations from the previous step, we used a multi-step synthetic citation generation pipeline to generate citations for the assistant response.
94
+
95
+ The resulting data instances were used to train the LoRA adapter.
96
+
97
+ ### Training Data
98
+
99
+ The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:
100
+ - [CoQA](https://stanfordnlp.github.io/coqa/) - Wikipedia passages
101
+ - [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
102
+ - [QuAC](https://huggingface.co/datasets/allenai/quac)
103
+
104
+ #### Training Hyperparameters
105
+
106
+ The LoRA adapter was fine-tuned using PEFT under the following regime: rank = 8, learning rate = 1e-5, and 90/10 split between training and validation.
107
+
108
+
109
+ ## Evaluation
110
+
111
+ We evaluate the LoRA adapter on two citation benchmarks:
112
+ - [ALCE](https://aclanthology.org/2023.emnlp-main.398/): Evaluates the ability of models to produce document/passage-level citations (i.e., identify the documents/passages that support a statement in the response).
113
+ - [LongBench-Cite](https://arxiv.org/abs/2409.02897): Evaluates the ability of models to produce fine-grained span-level citations (i.e., identify the spans within the input documents/passages that support a statement in the response) with a focus on long contexts.
114
+
115
+ Since the LoRA adapter is a post-hoc citation generation approach, its performance on the two benchmarks depends on the assistant responses for which it is asked to generate citations. To facilitate an apples-to-apples comparison, for each experiment, we keep the assistant responses the same and change the model that is used to generate the citations. In particular, we prompt an LLM to create an assistant response together with citations and evaluate the generated citations on the corresponding benchmark. Then, we compute and evaluate the citations generated for the same LLM response by the LoRA adapter.
116
+
117
+ ### Evaluation on ALCE
118
+
119
+ For the ALCE evaluation, we prompt Llama-3.1-70B-Instruct and Mixtral-8x22B-Instruct to generate both the assistant response and corresponding passage-level citations. We first calculate the performance of the citations generated by these models on ALCE. Subsequently, we feed the responses of these models (leaving out the citations) to the LoRA adapter and evaluate its generated citations. The results are shown in the table below:
120
+
121
+ Model used to generate response | Model used to generate citations | Recall | Precision | F1 |
122
+ |--------------| ----------------------------- | --------------- | ----------------- | --------- |
123
+ | Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct | 61.4 | 58.1 | 59.7 |
124
+ | Llama-3.1-70B-Instruct | Granite-3.3-8B LoRA citations | 55.4 | 65.0 | 59.8 |
125
+ | Mixtral-8x22B-Instruct | Mixtral-8x22B-Instruct | 62.2 | 62.5 | 62.3 |
126
+ | Mixtral-8x22B-Instruct | Granite-3.3-8B LoRA citations | 55.6 | 69.0 | 61.6 |
127
+
128
+ We observe that the LoRA adapter performs on par with much bigger models when those are prompted to create passage-level citations. It is interesting to note that while the adapter's F1 performance is similar to the baselines, it exhibits a different precision-recall trade-off, trading lower recall for higher precision.
129
+
130
+ Notes:
131
+ - All results are reported on the ELI5 dataset using the ORACLE (5-psg) setting.
132
+ - To prompt Llama and Mixtral, we employ a setting similar to the one proposed in the ALCE paper; in particular we use a two-shot prompt comprised of two of the ICL examples from ALCE as well as a slightly modified version of the instruction from the paper.
133
+ - Sentence splitting of context/response is performed using NLTK.
134
+ - Finally, since ALCE expects passage-level citations, we elevate the finer-grained citations produced by the LoRA adapter to the passage level before running the ALCE evaluation.
135
+
136
+
137
+ ### Evaluation on LongBench-Cite
138
+
139
+ For the LonBench-Cite evaluation, we prompt Llama-3.1-70B-Instruct to generate both the assistant response and corresponding citations. Then we evaluate the citations generated by Llama as well as the post-hoc citations generated by the LoRA adapter when invoked on the Llama responses. The results are shown in the table below:
140
+
141
+ <table>
142
+ <tr>
143
+ <th>Model used to generate response</th>
144
+ <th>Model used to generate citations</th>
145
+ <th colspan="3">Longbench-Chat (en)</th>
146
+ <th colspan="3">MultifieldQA (en)</th>
147
+ <th colspan="3">HotpotQA</th>
148
+ <th colspan="3">GovReport</th>
149
+ </tr>
150
+ <tr>
151
+ <th></th>
152
+ <th></th>
153
+ <th>R</th><th>P</th><th>F1</th>
154
+ <th>R</th><th>P</th><th>F1</th>
155
+ <th>R</th><th>P</th><th>F1</th>
156
+ <th>R</th><th>P</th><th>F1</th>
157
+ </tr>
158
+ <tr>
159
+ <td>Llama-3.1-70B-Instruct</td>
160
+ <td>Llama-3.1-70B-Instruct</td>
161
+ <td>27.0</td><td>34.4</td><td>26.1</td>
162
+ <td>46.1</td><td>63.3</td><td>49.7</td>
163
+ <td>34.0</td><td>39.4</td><td>30.2</td>
164
+ <td>55.0</td><td>77.5</td><td>62.0</td>
165
+ </tr>
166
+ <tr>
167
+ <td>Llama-3.1-70B-Instruct</td>
168
+ <td>Granite-3.3-8B LoRA citations</td>
169
+ <td>57.6</td><td>60.3</td><td>58.4</td>
170
+ <td>71.5</td><td>82.5</td><td>75.0</td>
171
+ <td>65.3</td><td>71.3</td><td>63.8</td>
172
+ <td>72.8</td><td>83.5</td><td>77.2</td>
173
+ </tr>
174
+ </table>
175
+
176
+ We observe that the LoRA adapter performs across the board significantly better than Llama-3.1-70B-Instruct when prompted to create span-level citations. This demonstrates the value of the adapter to create post-hoc citations even for assistant responses generated by much bigger LLMs.
177
+
178
+ Notes:
179
+ - The evaluation results are reported on the English subset of LongBench-Cite (i.e., restricted to instances whose `language` field equals to `en`).
180
+ - The results for the LoRA adapter do not include the performance for 4/585 tasks, which encountered out of memory errors.
181
+ - To prompt Llama to generate a response with citations, we use the one-shot prompt described in the paper.
182
+ - For the LoRA adapter, sentence splitting of the context is performed using NLTK. For the response, we reuse the splitting in Llama's output (since the LongBench-Cite prompt instructs the model to output a response split into sentences/statements).
183
+
184
+ ## Model Card Authors
185
+
186
+ [Yannis Katsis](mailto:[email protected])</br>
187
+ [Chulaka Gunasekara](mailto:[email protected])
citation_generation_lora/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "ibm-granite/granite-3.3-8b-instruct",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.1,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 16,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "q_proj",
24
+ "k_proj",
25
+ "v_proj",
26
+ "o_proj",
27
+ "up_proj",
28
+ "down_proj",
29
+ "gate_proj"
30
+ ],
31
+ "task_type": "CAUSAL_LM",
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
citation_generation_lora/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3069aac1eccbaa522e921319a016a63a5d86370ce227f5bbe9912b7e74c45e2d
3
+ size 99034536
citation_generation_lora/added_tokens.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|end_of_cite|>": 49156,
3
+ "<|end_of_plugin|>": 49158,
4
+ "<|end_of_role|>": 49153,
5
+ "<|start_of_cite|>": 49155,
6
+ "<|start_of_plugin|>": 49157,
7
+ "<|start_of_role|>": 49152,
8
+ "<|tool_call|>": 49154
9
+ }
citation_generation_lora/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
citation_generation_lora/special_tokens_map.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|start_of_role|>",
4
+ "<|end_of_role|>",
5
+ "<|tool_call|>",
6
+ "<|start_of_cite|>",
7
+ "<|end_of_cite|>",
8
+ "<|start_of_plugin|>",
9
+ "<|end_of_plugin|>"
10
+ ],
11
+ "bos_token": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "eos_token": {
19
+ "content": "<|end_of_text|>",
20
+ "lstrip": false,
21
+ "normalized": false,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "pad_token": {
26
+ "content": "<|end_of_text|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "unk_token": {
33
+ "content": "<|end_of_text|>",
34
+ "lstrip": false,
35
+ "normalized": false,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ }
39
+ }
citation_generation_lora/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
citation_generation_lora/tokenizer_config.json ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<|end_of_text|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<fim_prefix>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "<fim_middle>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "<fim_suffix>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "4": {
38
+ "content": "<fim_pad>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "5": {
46
+ "content": "<filename>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "6": {
54
+ "content": "<gh_stars>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "7": {
62
+ "content": "<issue_start>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "8": {
70
+ "content": "<issue_comment>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "9": {
78
+ "content": "<issue_closed>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "10": {
86
+ "content": "<jupyter_start>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "11": {
94
+ "content": "<jupyter_text>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "12": {
102
+ "content": "<jupyter_code>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "13": {
110
+ "content": "<jupyter_output>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "14": {
118
+ "content": "<empty_output>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": true
124
+ },
125
+ "15": {
126
+ "content": "<commit_before>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": true
132
+ },
133
+ "16": {
134
+ "content": "<commit_msg>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": true
140
+ },
141
+ "17": {
142
+ "content": "<commit_after>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": true
148
+ },
149
+ "18": {
150
+ "content": "<reponame>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": true
156
+ },
157
+ "49152": {
158
+ "content": "<|start_of_role|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": true
164
+ },
165
+ "49153": {
166
+ "content": "<|end_of_role|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": true
172
+ },
173
+ "49154": {
174
+ "content": "<|tool_call|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": true
180
+ },
181
+ "49155": {
182
+ "content": "<|start_of_cite|>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": true
188
+ },
189
+ "49156": {
190
+ "content": "<|end_of_cite|>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": true
196
+ },
197
+ "49157": {
198
+ "content": "<|start_of_plugin|>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": true
204
+ },
205
+ "49158": {
206
+ "content": "<|end_of_plugin|>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": true
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|start_of_role|>",
216
+ "<|end_of_role|>",
217
+ "<|tool_call|>",
218
+ "<|start_of_cite|>",
219
+ "<|end_of_cite|>",
220
+ "<|start_of_plugin|>",
221
+ "<|end_of_plugin|>"
222
+ ],
223
+ "bos_token": "<|end_of_text|>",
224
+ "chat_template": "{# Alias tools -> available_tools #}\n{%- if tools and not available_tools -%}\n {%- set available_tools = tools -%}\n{%- endif -%}\n{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content'] %}\n {%- set loop_messages = messages[1:] %}\n {%- else %}\n {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n {%- if available_tools and documents %}\n {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n {%- elif available_tools %}\n {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n {%- elif documents %}\n {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n {%- elif thinking %}\n {%- set system_message = system_message + \" You are a helpful AI assistant.\nRespond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts between <think></think> and write your response between <response></response> for each user query.\" %}\n {%- else %}\n {%- set system_message = system_message + \" You are a helpful AI assistant.\" %}\n {%- endif %}\n {%- if 'citations' in controls and documents %}\n {%- set system_message = system_message + '\nUse the symbols <|start_of_cite|> and <|end_of_cite|> to indicate when a fact comes from a document in the search result, e.g <|start_of_cite|> {document_id: 1}my fact <|end_of_cite|> for a fact from document 1. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n {%- endif %}\n {%- if 'hallucinations' in controls and documents %}\n {%- set system_message = system_message + '\nFinally, after the response is written, include a numbered list of sentences from the response with a corresponding risk value that are hallucinated and not based in the documents.' %}\n {%- endif %}\n {%- set loop_messages = messages %}\n {%- endif %}\n {{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n {%- if available_tools %}\n {{- '<|start_of_role|>available_tools<|end_of_role|>' }}\n {{- available_tools | tojson(indent=4) }}\n {{- '<|end_of_text|>\n' }}\n {%- endif %}\n {%- if documents %}\n {%- for document in documents %}\n {{- '<|start_of_role|>document {\"document_id\": \"' + document['doc_id'] | string + '\"}<|end_of_role|>\n' }}\n {{- document['text'] }}\n {{- '<|end_of_text|>\n' }}\n {%- endfor %}\n {%- endif %}\n {%- for message in loop_messages %}\n {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n {%- if loop.last and add_generation_prompt %}\n {{- '<|start_of_role|>assistant' }}\n {%- if controls %}\n {{- ' ' + controls | tojson()}}\n {%- endif %}\n {{- '<|end_of_role|>' }}\n {%- endif %}\n {%- endfor %}",
225
+ "clean_up_tokenization_spaces": true,
226
+ "eos_token": "<|end_of_text|>",
227
+ "errors": "replace",
228
+ "extra_special_tokens": {},
229
+ "model_max_length": 9223372036854775807,
230
+ "pad_token": "<|end_of_text|>",
231
+ "padding_side": "left",
232
+ "tokenizer_class": "GPT2Tokenizer",
233
+ "unk_token": "<|end_of_text|>",
234
+ "vocab_size": 49152
235
+ }
citation_generation_lora/training_config.json ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "rl_model_name": null,
3
+ "rl_model_class": null,
4
+ "num_rollouts": 64,
5
+ "chunk_size": 16,
6
+ "ppo_epochs": 4,
7
+ "init_kl_coef": 0.05,
8
+ "num_layers_unfrozen": 2,
9
+ "rm_bits": 8,
10
+ "rl_lora_rank": 8,
11
+ "rl_lora_alpha": 32,
12
+ "rl_lora_dropout": 0.1,
13
+ "use_qlora_in_rl": false,
14
+ "use_rl_peft_config": true,
15
+ "lora_compute_dtype": "bfloat16",
16
+ "steps_per_print": 10,
17
+ "logdir": "aim-repo",
18
+ "aim_repo": null,
19
+ "experiment_name": "granite_3.3_prompted_citations_response_id_default_prompt",
20
+ "stage": 2,
21
+ "overlap_comm": false,
22
+ "contiguous_gradients": false,
23
+ "cpu_offload": false,
24
+ "optimizer": {
25
+ "optimizer_class": "FusedAdam",
26
+ "lr": 1e-05,
27
+ "weight_decay": 0.1,
28
+ "betas": [
29
+ 0.9,
30
+ 0.95
31
+ ],
32
+ "eps": 1e-10
33
+ },
34
+ "lr_schedule": "linear",
35
+ "warmup_steps": 200,
36
+ "datasets": [
37
+ {
38
+ "data_class": "JSONLinesDatasetRAGChat",
39
+ "data_name": "simulator-citations-chat",
40
+ "data_path": "data/citations/prompting_appraoch_3_3",
41
+ "data_sampling_proportion": 1,
42
+ "max_input_tokens": 5000,
43
+ "max_output_tokens": 512
44
+ }
45
+ ],
46
+ "seed": 42,
47
+ "training_inference_type": "lora_finetuning",
48
+ "prompt_tuning_init": null,
49
+ "prompt_tuning_init_text": null,
50
+ "num_virtual_tokens": null,
51
+ "load_path": null,
52
+ "peft_num_layers": null,
53
+ "peft_num_attention_heads": null,
54
+ "model_name": "ibm-granite/granite-3.3-8b-instruct",
55
+ "tokenizer_name": null,
56
+ "model_class": "AutoModelForCausalLM",
57
+ "gma_model_class": "Model",
58
+ "dtype": "bfloat16",
59
+ "trust_remote_code": false,
60
+ "padding_side": null,
61
+ "lora_rank": 16,
62
+ "lora_alpha": 16,
63
+ "lora_dropout": 0.1,
64
+ "lora_target_modules": [
65
+ "q_proj",
66
+ "k_proj",
67
+ "v_proj",
68
+ "o_proj",
69
+ "up_proj",
70
+ "down_proj",
71
+ "gate_proj"
72
+ ],
73
+ "save_huggingface_checkpoint": true,
74
+ "quantization_method": "fp4",
75
+ "bnb_4bit_use_double_quant": false,
76
+ "use_quantization_for_inference": false,
77
+ "max_seq_len": 2048,
78
+ "attention_implementation": "flash_attention_2",
79
+ "num_labels": 1,
80
+ "use_sdpa_attention": false,
81
+ "save_path": "checkpoints/granite_3.3_prompted_citations_default_prompt",
82
+ "ignore_sampling_proportion_for_validation": false,
83
+ "num_training_steps": 200000,
84
+ "gradient_accumulation_steps": 1,
85
+ "eval_interval": 20000,
86
+ "save_interval": 20000,
87
+ "batch_size_per_gpu": 1,
88
+ "coeff": 1.0,
89
+ "eval_during_training": true,
90
+ "smart_token_allocation": false,
91
+ "max_new_tokens": 0,
92
+ "gradient_checkpointing": true
93
+ }
citation_generation_lora/vocab.json ADDED
The diff for this file is too large to render. See raw diff