|
--- |
|
base_model: openbmb/MiniCPM3-4B |
|
library_name: peft |
|
license: apache-2.0 |
|
language: |
|
- zh |
|
- en |
|
--- |
|
|
|
## MiniCPM3-RAG-LoRA |
|
|
|
**MiniCPM3-RAG-LoRA** 由面壁智能与清华大学自然语言处理实验室(THUNLP)共同开发,采用直接偏好优化(DPO)方法对 [MiniCPM3](https://huggingface.co/openbmb/MiniCPM3-4B) 进行 LoRA 微调,仅基于两万余条开放域问答和逻辑推理任务的开源数据,在通用评测数据集上实现了模型性能平均提升 13%。 |
|
|
|
欢迎关注 `MiniCPM3` 与 RAG 套件系列: |
|
|
|
- 生成模型:[MiniCPM3](https://huggingface.co/openbmb/MiniCPM3-4B) |
|
- 检索模型:[RankCPM-E](https://huggingface.co/openbmb/RankCPM-E) |
|
- 重排模型:[RankCPM-R](https://huggingface.co/openbmb/RankCPM-R) |
|
- 面向 RAG 场景的 LoRA 插件:[MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA) |
|
|
|
**MiniCPM3-RAG-LoRA** developed by ModelBest Inc. and THUNLP, utilizes the Direct Preference Optimization (DPO) method to fine-tune [MiniCPM3](https://huggingface.co/openbmb/MiniCPM3-4B) with LoRA. By training on just over 20,000 open-source data points from open-domain question answering and logical reasoning tasks, the model achieved an average performance improvement of 13% on general benchmark datasets. |
|
|
|
We also invite you to explore MiniCPM3 and the RAG toolkit series: |
|
|
|
- Generation Model: [MiniCPM3](https://huggingface.co/openbmb/MiniCPM3-4B) |
|
- Retrieval Model: [RankCPM-E](https://huggingface.co/openbmb/RankCPM-E) |
|
- Re-ranking Model: [RankCPM-R](https://huggingface.co/openbmb/RankCPM-R) |
|
- LoRA Plugin for RAG scenarios: [MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA) |
|
|
|
## 模型信息 Model Information |
|
|
|
- 模型大小:4B |
|
- Model Size: 4B |
|
|
|
## 模型使用 Usage |
|
|
|
### 输入格式 Input Format |
|
|
|
MiniCPM3-RAG-LoRA 模型遵循格式如下: |
|
|
|
MiniCPM3-RAG-LoRA supports instructions in the following format: |
|
|
|
``` |
|
Background: {{ passages }} Query: {{ query }} |
|
``` |
|
|
|
例如: |
|
|
|
For example: |
|
|
|
``` |
|
Background: |
|
["In the novel 'The Silent Watcher,' the lead character is named Alex Carter. Alex is a private detective who uncovers a series of mysterious events in a small town.", |
|
"Set in a quiet town, 'The Silent Watcher' follows Alex Carter, a former police officer turned private investigator, as he unravels the town's dark secrets.", |
|
"'The Silent Watcher' revolves around Alex Carter's journey as he confronts his past while solving complex cases in his hometown."] |
|
|
|
Query: |
|
"What is the name of the lead character in the novel 'The Silent Watcher'?" |
|
``` |
|
|
|
### 环境要求 Requirements |
|
|
|
``` |
|
transformers>=4.36.0 |
|
``` |
|
|
|
### 示例脚本 Demo |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
torch.manual_seed(0) |
|
|
|
path = 'openbmb/MiniCPM3-RAG-LoRA' |
|
tokenizer = AutoTokenizer.from_pretrained(path) |
|
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True) |
|
|
|
passages = ["In the novel 'The Silent Watcher,' the lead character is named Alex Carter. Alex is a private detective who uncovers a series of mysterious events in a small town.", |
|
"Set in a quiet town, 'The Silent Watcher' follows Alex Carter, a former police officer turned private investigator, as he unravels the town's dark secrets.", |
|
"'The Silent Watcher' revolves around Alex Carter's journey as he confronts his past while solving complex cases in his hometown."] |
|
query = "What is the name of the lead character in the novel 'The Silent Watcher'?" |
|
|
|
input_text = 'Background:\n' + str(passages) + '\n\n' + 'Query:\n' + str(query) + '\n\n' |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
{"role": "user", "content": input_text}, |
|
] |
|
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) |
|
|
|
outputs = model.chat(tokenizer, prompt, temperature=0.8, top_p=0.8) |
|
print(outputs[0]) # The lead character in the novel 'The Silent Watcher' is named Alex Carter. |
|
``` |
|
|
|
## 实验结果 Evaluation Results |
|
|
|
经过针对RAG场景的LoRA训练后,MiniCPM3-RAG-LoRA在开放域问答(NQ、TQA、MARCO)、多跳问答(HotpotQA)、对话(WoW)、事实核查(FEVER)和信息填充(T-REx)等多项任务上的性能表现,超越Llama3-8B和Baichuan2-13B等业内优秀模型。 |
|
|
|
After being fine-tuned with LoRA for RAG scenarios, MiniCPM3-RAG-LoRA outperforms leading industry models like Llama3-8B and Baichuan2-13B across various tasks, including open-domain question answering (NQ, TQA, MARCO), multi-hop question answering (HotpotQA), dialogue (WoW), fact checking (FEVER), and information filling (T-REx). |
|
|
|
| | NQ(Acc) | TQA(Acc) | MARCO(ROUGE) | HotpotQA(Acc) | WoW(F1) | FEVER(Acc) | T-REx(Acc) | |
|
| :---------------: | :-----: | :------: | :----------: | :-----------: | :-----: | :--------: | :--------: | |
|
| Llama3-8B | _45.36_ | **83.15** | _20.81_ | _28.52_ | 10.96 | 78.08 | 26.62 | |
|
| Baichuan2-13B | 43.36 | 77.76 | 14.28 | 27.59 | 13.34 | 31.37 | _27.46_ | |
|
| MiniCPM3 | 43.21 | 80.77 | 16.06 | 26.00 | _14.60_ | **87.22** | 26.26 | |
|
| MiniCPM3-RAG-LoRA | **48.36** | _82.40_ | **27.68** | **31.61** | **16.29** | _85.81_ | **40.76** | |
|
|
|
|
|
## 许可证 License |
|
|
|
- 本仓库中代码依照 [Apache-2.0 协议](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE)开源。 |
|
- RankCPM-R 模型权重的使用则需要遵循 [MiniCPM 模型协议](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md)。 |
|
- RankCPM-R 模型权重对学术研究完全开放。如需将模型用于商业用途,请填写[此问卷](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g)。 |
|
|
|
* The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License. |
|
* The usage of RankCPM-R model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md). |
|
* The models and weights of RankCPM-R are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, RankCPM-R weights are also available for free commercial use. |
|
<!-- ### 测试集介绍: |
|
|
|
- **Natural Questions (NQ, Accuracy):** |
|
- **简介**: Natural Questions 是一个开放域问答数据集,由真实用户在Google搜索中提出的问题组成。数据集中每个问题都有一个长文档作为上下文,并包含短答案和长答案。 |
|
- **评价指标**: 准确率(Accuracy)用于衡量模型是否能够正确地识别出与问题相关的短答案。 |
|
- **TriviaQA (TQA, Accuracy):** |
|
- **简介:** TriviaQA 是一个涵盖广泛主题的问答数据集,问题和答案从各类问答网站和百科全书中收集而来。 |
|
- **评价指标:** 准确率(Accuracy)用于衡量模型能否正确地回答这些问题。 |
|
- **MS MARCO (ROUGE):** |
|
- **简介:** MS MARCO 是一个大规模的开放域问答数据集,主要由Bing搜索引擎用户的查询和相应的答案组成。数据集包含简短答案和相关段落,广泛用于信息检索和生成任务。由于MS MARCO数据集规模庞大,我们从中选取了3000条数据进行本次评测。 |
|
- **评价指标:** ROUGE 用于评估模型生成的答案与参考答案之间的重叠程度,衡量生成答案的质量。 |
|
- **HotpotQA (Accuracy):** |
|
- **简介:** HotpotQA 是一个多跳问答数据集,要求模型通过跨越多个文档的推理来回答复杂问题。该数据集不仅测试模型的答案生成能力,还考察其推理过程的可解释性。 |
|
- **评价指标:** 准确率(Accuracy)用于衡量模型能否正确地回答需要多跳推理的问题。 |
|
- **Wizard of Wikipedia (WoW, F1 Score):** |
|
- **简介:** Wizard of Wikipedia 是一个对话数据集,专注于知识型对话场景,要求模型能够在对话中生成与主题相关的、丰富的信息,每个对话轮次都有对应的知识库条目作为支持。 |
|
- **评价指标:** F1 值用于衡量模型生成的回答与参考答案在词级别上的重合情况,评估回答的准确性和全面性。 |
|
- **FEVER (Accuracy):** |
|
- **简介:** FEVER 是一个事实核查数据集,包含大量的陈述句,模型需要根据给定的文档来判断这些陈述句是否为真或假,该数据集旨在测试模型的事实核查能力。 |
|
- **评价指标:** 准确率(Accuracy)用于评估模型在判断陈述句的真实性方面的表现。 |
|
- **T-REx (Accuracy):** |
|
- **简介:** T-REx 是一个知识库槽填充数据集,包含从维基百科中提取的实体-关系对。模型需要根据上下文信息填充缺失的槽值,测试其对知识库关系的理解和填充能力。 |
|
- **评价指标:** 准确率(Accuracy)用于衡量模型在正确填充缺失槽值方面的表现。 --> |