Support fine-tuning (#52)
Browse files- Support fine-tuning (b552258c803a26b02e8bc80811458a80938f9ab0)
Co-authored-by: tastelikefeet <[email protected]>
README.md
CHANGED
|
@@ -5685,6 +5685,46 @@ In addition to the open-source [GTE](https://huggingface.co/collections/Alibaba-
|
|
| 5685 |
|
| 5686 |
Note that the models behind the commercial APIs are not entirely identical to the open-source models.
|
| 5687 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5688 |
## Citation
|
| 5689 |
|
| 5690 |
If you find our paper or models helpful, please consider cite:
|
|
|
|
| 5685 |
|
| 5686 |
Note that the models behind the commercial APIs are not entirely identical to the open-source models.
|
| 5687 |
|
| 5688 |
+
## Community support
|
| 5689 |
+
|
| 5690 |
+
### Fine-tuning
|
| 5691 |
+
|
| 5692 |
+
GTE models can be fine-tuned with a third party framework SWIFT.
|
| 5693 |
+
|
| 5694 |
+
```shell
|
| 5695 |
+
pip install ms-swift -U
|
| 5696 |
+
```
|
| 5697 |
+
|
| 5698 |
+
```shell
|
| 5699 |
+
# check: https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html
|
| 5700 |
+
nproc_per_node=8
|
| 5701 |
+
NPROC_PER_NODE=$nproc_per_node \
|
| 5702 |
+
USE_HF=1 \
|
| 5703 |
+
swift sft \
|
| 5704 |
+
--model Alibaba-NLP/gte-Qwen2-7B-instruct \
|
| 5705 |
+
--train_type lora \
|
| 5706 |
+
--dataset 'sentence-transformers/stsb' \
|
| 5707 |
+
--torch_dtype bfloat16 \
|
| 5708 |
+
--num_train_epochs 10 \
|
| 5709 |
+
--per_device_train_batch_size 2 \
|
| 5710 |
+
--per_device_eval_batch_size 1 \
|
| 5711 |
+
--gradient_accumulation_steps $(expr 64 / $nproc_per_node) \
|
| 5712 |
+
--eval_steps 100 \
|
| 5713 |
+
--save_steps 100 \
|
| 5714 |
+
--eval_strategy steps \
|
| 5715 |
+
--use_chat_template false \
|
| 5716 |
+
--save_total_limit 5 \
|
| 5717 |
+
--logging_steps 5 \
|
| 5718 |
+
--output_dir output \
|
| 5719 |
+
--warmup_ratio 0.05 \
|
| 5720 |
+
--learning_rate 5e-6 \
|
| 5721 |
+
--deepspeed zero3 \
|
| 5722 |
+
--dataloader_num_workers 4 \
|
| 5723 |
+
--task_type embedding \
|
| 5724 |
+
--loss_type cosine_similarity \
|
| 5725 |
+
--dataloader_drop_last true
|
| 5726 |
+
```
|
| 5727 |
+
|
| 5728 |
## Citation
|
| 5729 |
|
| 5730 |
If you find our paper or models helpful, please consider cite:
|