SageLite
/

SageLite-s

Model card Files Files and versions

Dejiao Z commited on Dec 4, 2024

Commit

ba0f032

·

1 Parent(s): fcf2699

updated readme

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -2,11 +2,12 @@
 license: apache-2.0
 datasets:
 - bigcode/the-stack-v2
--tiiuae/falcon-refinedweb
 library_name: transformers
 language:
 - code
 ---
 ## SageLite-s
@@ -66,9 +67,8 @@ SageLite is a new family of open embedding models with an encoder architecture t
 ### Training Data
-This checkpoint is trained on both [The-Stack-v2](https://huggingface.co/datasets/bigcode/the-stack-v2) and [Falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
-Stack data (https://huggingface.co/datasets/bigcode/the-stack-dedup). Supported languages (15 in total) are as follows: english (for text-only task), c, c-sharp, go, java, javascript, typescript, php, python, ruby.
 ### Training procedure
 This checkpoint is first trained on code data via masked language modeling (MLM), followed by two-stage contrastive learning -- constrastive pre-finetuning on large amount of positive pairs mined from the internet and constrastive finetuning on a small amount of synthetic data.

 license: apache-2.0
 datasets:
 - bigcode/the-stack-v2
+- tiiuae/falcon-refinedweb
 library_name: transformers
 language:
 - code
+- text
 ---
 ## SageLite-s
 ### Training Data
+This checkpoint is trained on both [The-Stack-v2](https://huggingface.co/datasets/bigcode/the-stack-v2) and [Falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). Supported languages (15 in total) are as follows: english (for text-only task), c, c-sharp, go, java, javascript, typescript, php, python, ruby.
 ### Training procedure
 This checkpoint is first trained on code data via masked language modeling (MLM), followed by two-stage contrastive learning -- constrastive pre-finetuning on large amount of positive pairs mined from the internet and constrastive finetuning on a small amount of synthetic data.