Add new SentenceTransformer model
Browse files
README.md
CHANGED
|
@@ -15,6 +15,45 @@ tags:
|
|
| 15 |
- dataset_size:1451941
|
| 16 |
- loss:MultipleNegativesRankingLoss
|
| 17 |
base_model: Alibaba-NLP/gte-modernbert-base
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
datasets:
|
| 19 |
- redis/langcache-sentencepairs-v1
|
| 20 |
pipeline_tag: sentence-similarity
|
|
@@ -106,9 +145,9 @@ from sentence_transformers import SentenceTransformer
|
|
| 106 |
model = SentenceTransformer("redis/langcache-embed-v3")
|
| 107 |
# Run inference
|
| 108 |
sentences = [
|
| 109 |
-
'
|
| 110 |
-
|
| 111 |
-
'
|
| 112 |
]
|
| 113 |
embeddings = model.encode(sentences)
|
| 114 |
print(embeddings.shape)
|
|
@@ -117,9 +156,9 @@ print(embeddings.shape)
|
|
| 117 |
# Get the similarity scores for the embeddings
|
| 118 |
similarities = model.similarity(embeddings, embeddings)
|
| 119 |
print(similarities)
|
| 120 |
-
# tensor([[0.
|
| 121 |
-
# [0.
|
| 122 |
-
# [0.
|
| 123 |
```
|
| 124 |
|
| 125 |
<!--
|
|
@@ -184,18 +223,18 @@ You can finetune this model on your own dataset.
|
|
| 184 |
|
| 185 |
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
|
| 186 |
* Size: 109,885 training samples
|
| 187 |
-
* Columns: <code>
|
| 188 |
* Approximate statistics based on the first 1000 samples:
|
| 189 |
-
| |
|
| 190 |
-
|
| 191 |
-
| type |
|
| 192 |
-
| details | <ul><li>min:
|
| 193 |
* Samples:
|
| 194 |
-
|
|
| 195 |
-
|
| 196 |
-
| <code>
|
| 197 |
-
| <code>
|
| 198 |
-
| <code>
|
| 199 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 200 |
```json
|
| 201 |
{
|
|
@@ -211,18 +250,18 @@ You can finetune this model on your own dataset.
|
|
| 211 |
|
| 212 |
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
|
| 213 |
* Size: 109,885 evaluation samples
|
| 214 |
-
* Columns: <code>
|
| 215 |
* Approximate statistics based on the first 1000 samples:
|
| 216 |
-
| |
|
| 217 |
-
|
| 218 |
-
| type |
|
| 219 |
-
| details | <ul><li>min:
|
| 220 |
* Samples:
|
| 221 |
-
|
|
| 222 |
-
|
| 223 |
-
| <code>
|
| 224 |
-
| <code>
|
| 225 |
-
| <code>
|
| 226 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 227 |
```json
|
| 228 |
{
|
|
|
|
| 15 |
- dataset_size:1451941
|
| 16 |
- loss:MultipleNegativesRankingLoss
|
| 17 |
base_model: Alibaba-NLP/gte-modernbert-base
|
| 18 |
+
widget:
|
| 19 |
+
- source_sentence: Gocharya ji authored Krishna Cahrit Manas in the poetic form describing
|
| 20 |
+
about the full life of Lord Krishna ( from birth to Nirvana ) .
|
| 21 |
+
sentences:
|
| 22 |
+
- 'Q: Can I buy coverage for prescription drugs right away?'
|
| 23 |
+
- Krishna Cahrit Manas in poetic form , describing the full life of Lord Krishna
|
| 24 |
+
( from birth to nirvana ) , wrote Gocharya ji .
|
| 25 |
+
- Baron played actress Violet Carson who portrayed Ena Sharples in the soap .
|
| 26 |
+
- source_sentence: The Kilkenny line only reached Maryborough in 1867 .
|
| 27 |
+
sentences:
|
| 28 |
+
- It was also known formerly as ' Crotto ' .
|
| 29 |
+
- The line from Maryborough only reached Kilkenny in 1867 .
|
| 30 |
+
- The line from Kilkenny only reached Maryborough in 1867 .
|
| 31 |
+
- source_sentence: Tokelau International Netball Team represents Tokelau in the national
|
| 32 |
+
netball .
|
| 33 |
+
sentences:
|
| 34 |
+
- Ernest Dewey Albinson ( 1898 in Minneapolis , Minnesota - 1971 in Mexico ) was
|
| 35 |
+
an American artist .
|
| 36 |
+
- The Tokelau national netball team represents Tokelau in international netball
|
| 37 |
+
.
|
| 38 |
+
- The Tokelau international netball team represents Tokelau in national netball
|
| 39 |
+
.
|
| 40 |
+
- source_sentence: The real number is called the `` imaginary part `` of the real
|
| 41 |
+
number ; the real number is called the `` complex part `` of .
|
| 42 |
+
sentences:
|
| 43 |
+
- The school board consists of Robbie Sanders , Bryan Richards , Linda Fullingim
|
| 44 |
+
, Lori Lambert , & Kelly Teague .
|
| 45 |
+
- Which web design company has the best templates?
|
| 46 |
+
- The real number is called the `` imaginary part `` of the real number , the real
|
| 47 |
+
number of `` complex part `` of .
|
| 48 |
+
- source_sentence: All For You was the third and last single of Kate Ryan 's third
|
| 49 |
+
album `` Alive `` .
|
| 50 |
+
sentences:
|
| 51 |
+
- According to John Keay , he was `` country bred `` ( born and educated in India
|
| 52 |
+
) .
|
| 53 |
+
- All For You was the third single of the third and last album `` Alive `` by Kate
|
| 54 |
+
Ryan .
|
| 55 |
+
- All For You was the third and last single of the third album of Kate Ryan `` Alive
|
| 56 |
+
`` .
|
| 57 |
datasets:
|
| 58 |
- redis/langcache-sentencepairs-v1
|
| 59 |
pipeline_tag: sentence-similarity
|
|
|
|
| 145 |
model = SentenceTransformer("redis/langcache-embed-v3")
|
| 146 |
# Run inference
|
| 147 |
sentences = [
|
| 148 |
+
"All For You was the third and last single of Kate Ryan 's third album `` Alive `` .",
|
| 149 |
+
'All For You was the third and last single of the third album of Kate Ryan `` Alive `` .',
|
| 150 |
+
'All For You was the third single of the third and last album `` Alive `` by Kate Ryan .',
|
| 151 |
]
|
| 152 |
embeddings = model.encode(sentences)
|
| 153 |
print(embeddings.shape)
|
|
|
|
| 156 |
# Get the similarity scores for the embeddings
|
| 157 |
similarities = model.similarity(embeddings, embeddings)
|
| 158 |
print(similarities)
|
| 159 |
+
# tensor([[0.9961, 0.9922, 0.9961],
|
| 160 |
+
# [0.9922, 1.0000, 0.9922],
|
| 161 |
+
# [0.9961, 0.9922, 1.0078]], dtype=torch.bfloat16)
|
| 162 |
```
|
| 163 |
|
| 164 |
<!--
|
|
|
|
| 223 |
|
| 224 |
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
|
| 225 |
* Size: 109,885 training samples
|
| 226 |
+
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
|
| 227 |
* Approximate statistics based on the first 1000 samples:
|
| 228 |
+
| | anchor | positive | negative |
|
| 229 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 230 |
+
| type | string | string | string |
|
| 231 |
+
| details | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 26.47 tokens</li><li>max: 61 tokens</li></ul> |
|
| 232 |
* Samples:
|
| 233 |
+
| anchor | positive | negative |
|
| 234 |
+
|:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
|
| 235 |
+
| <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>how can I get financial freedom as soon as possible?</code> |
|
| 236 |
+
| <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The older Punts are still very much in existence today and race in the same fleets as the newer boats .</code> |
|
| 237 |
+
| <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
|
| 238 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 239 |
```json
|
| 240 |
{
|
|
|
|
| 250 |
|
| 251 |
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
|
| 252 |
* Size: 109,885 evaluation samples
|
| 253 |
+
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
|
| 254 |
* Approximate statistics based on the first 1000 samples:
|
| 255 |
+
| | anchor | positive | negative |
|
| 256 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 257 |
+
| type | string | string | string |
|
| 258 |
+
| details | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 26.47 tokens</li><li>max: 61 tokens</li></ul> |
|
| 259 |
* Samples:
|
| 260 |
+
| anchor | positive | negative |
|
| 261 |
+
|:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
|
| 262 |
+
| <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>how can I get financial freedom as soon as possible?</code> |
|
| 263 |
+
| <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The older Punts are still very much in existence today and race in the same fleets as the newer boats .</code> |
|
| 264 |
+
| <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
|
| 265 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 266 |
```json
|
| 267 |
{
|