Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,7 @@
|
|
| 1 |
---
|
| 2 |
base_model:
|
| 3 |
- Snowflake/snowflake-arctic-embed-m-long
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
|
|
@@ -60,5 +61,4 @@ print(code_embeddings)
|
|
| 60 |
|
| 61 |
|
| 62 |
## Training
|
| 63 |
-
We use a bi-encoder architecture for `CodeRankEmbed`, with weights shared between the text and code encoder. The retriever is contrastively fine-tuned with InfoNCE loss on a 21 million example high-quality dataset we curated called [CoRNStack](https://gangiswag.github.io/cornstack/). Our encoder is initialized with [Arctic-Embed-M-Long](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long), a 137M parameter text encoder supporting an extended context length of 8,192 tokens.
|
| 64 |
-
|
|
|
|
| 1 |
---
|
| 2 |
base_model:
|
| 3 |
- Snowflake/snowflake-arctic-embed-m-long
|
| 4 |
+
library_name: sentence-transformers
|
| 5 |
---
|
| 6 |
|
| 7 |
|
|
|
|
| 61 |
|
| 62 |
|
| 63 |
## Training
|
| 64 |
+
We use a bi-encoder architecture for `CodeRankEmbed`, with weights shared between the text and code encoder. The retriever is contrastively fine-tuned with InfoNCE loss on a 21 million example high-quality dataset we curated called [CoRNStack](https://gangiswag.github.io/cornstack/). Our encoder is initialized with [Arctic-Embed-M-Long](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long), a 137M parameter text encoder supporting an extended context length of 8,192 tokens.
|
|
|