radoslavralev commited on
Commit
a7ab361
·
verified ·
1 Parent(s): 6e71c42

Add new SentenceTransformer model

Browse files
Files changed (1) hide show
  1. README.md +65 -26
README.md CHANGED
@@ -15,6 +15,45 @@ tags:
15
  - dataset_size:1451941
16
  - loss:MultipleNegativesRankingLoss
17
  base_model: Alibaba-NLP/gte-modernbert-base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  datasets:
19
  - redis/langcache-sentencepairs-v1
20
  pipeline_tag: sentence-similarity
@@ -106,9 +145,9 @@ from sentence_transformers import SentenceTransformer
106
  model = SentenceTransformer("redis/langcache-embed-v3")
107
  # Run inference
108
  sentences = [
109
- 'The weather is lovely today.',
110
- "It's so sunny outside!",
111
- 'He drove to the stadium.',
112
  ]
113
  embeddings = model.encode(sentences)
114
  print(embeddings.shape)
@@ -117,9 +156,9 @@ print(embeddings.shape)
117
  # Get the similarity scores for the embeddings
118
  similarities = model.similarity(embeddings, embeddings)
119
  print(similarities)
120
- # tensor([[0.9922, 0.7891, 0.4629],
121
- # [0.7891, 1.0000, 0.5117],
122
- # [0.4629, 0.5117, 1.0000]], dtype=torch.bfloat16)
123
  ```
124
 
125
  <!--
@@ -184,18 +223,18 @@ You can finetune this model on your own dataset.
184
 
185
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
186
  * Size: 109,885 training samples
187
- * Columns: <code>texts</code>
188
  * Approximate statistics based on the first 1000 samples:
189
- | | texts |
190
- |:--------|:--------------------------------------------------------------------------------------|
191
- | type | list |
192
- | details | <ul><li>min: 3 elements</li><li>mean: 3.50 elements</li><li>max: 4 elements</li></ul> |
193
  * Samples:
194
- | texts |
195
- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
196
- | <code>['The newer Punts are still very much in existence today and race in the same fleets as the older boats .', 'The newer punts are still very much in existence today and run in the same fleets as the older boats .', 'how can I get financial freedom as soon as possible?']</code> |
197
- | <code>['The newer punts are still very much in existence today and run in the same fleets as the older boats .', 'The newer Punts are still very much in existence today and race in the same fleets as the older boats .', 'The older Punts are still very much in existence today and race in the same fleets as the newer boats .']</code> |
198
- | <code>['Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .', 'Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .', 'Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .']</code> |
199
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
200
  ```json
201
  {
@@ -211,18 +250,18 @@ You can finetune this model on your own dataset.
211
 
212
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
213
  * Size: 109,885 evaluation samples
214
- * Columns: <code>texts</code>
215
  * Approximate statistics based on the first 1000 samples:
216
- | | texts |
217
- |:--------|:--------------------------------------------------------------------------------------|
218
- | type | list |
219
- | details | <ul><li>min: 3 elements</li><li>mean: 3.50 elements</li><li>max: 4 elements</li></ul> |
220
  * Samples:
221
- | texts |
222
- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
223
- | <code>['The newer Punts are still very much in existence today and race in the same fleets as the older boats .', 'The newer punts are still very much in existence today and run in the same fleets as the older boats .', 'how can I get financial freedom as soon as possible?']</code> |
224
- | <code>['The newer punts are still very much in existence today and run in the same fleets as the older boats .', 'The newer Punts are still very much in existence today and race in the same fleets as the older boats .', 'The older Punts are still very much in existence today and race in the same fleets as the newer boats .']</code> |
225
- | <code>['Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .', 'Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .', 'Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .']</code> |
226
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
227
  ```json
228
  {
 
15
  - dataset_size:1451941
16
  - loss:MultipleNegativesRankingLoss
17
  base_model: Alibaba-NLP/gte-modernbert-base
18
+ widget:
19
+ - source_sentence: Gocharya ji authored Krishna Cahrit Manas in the poetic form describing
20
+ about the full life of Lord Krishna ( from birth to Nirvana ) .
21
+ sentences:
22
+ - 'Q: Can I buy coverage for prescription drugs right away?'
23
+ - Krishna Cahrit Manas in poetic form , describing the full life of Lord Krishna
24
+ ( from birth to nirvana ) , wrote Gocharya ji .
25
+ - Baron played actress Violet Carson who portrayed Ena Sharples in the soap .
26
+ - source_sentence: The Kilkenny line only reached Maryborough in 1867 .
27
+ sentences:
28
+ - It was also known formerly as ' Crotto ' .
29
+ - The line from Maryborough only reached Kilkenny in 1867 .
30
+ - The line from Kilkenny only reached Maryborough in 1867 .
31
+ - source_sentence: Tokelau International Netball Team represents Tokelau in the national
32
+ netball .
33
+ sentences:
34
+ - Ernest Dewey Albinson ( 1898 in Minneapolis , Minnesota - 1971 in Mexico ) was
35
+ an American artist .
36
+ - The Tokelau national netball team represents Tokelau in international netball
37
+ .
38
+ - The Tokelau international netball team represents Tokelau in national netball
39
+ .
40
+ - source_sentence: The real number is called the `` imaginary part `` of the real
41
+ number ; the real number is called the `` complex part `` of .
42
+ sentences:
43
+ - The school board consists of Robbie Sanders , Bryan Richards , Linda Fullingim
44
+ , Lori Lambert , & Kelly Teague .
45
+ - Which web design company has the best templates?
46
+ - The real number is called the `` imaginary part `` of the real number , the real
47
+ number of `` complex part `` of .
48
+ - source_sentence: All For You was the third and last single of Kate Ryan 's third
49
+ album `` Alive `` .
50
+ sentences:
51
+ - According to John Keay , he was `` country bred `` ( born and educated in India
52
+ ) .
53
+ - All For You was the third single of the third and last album `` Alive `` by Kate
54
+ Ryan .
55
+ - All For You was the third and last single of the third album of Kate Ryan `` Alive
56
+ `` .
57
  datasets:
58
  - redis/langcache-sentencepairs-v1
59
  pipeline_tag: sentence-similarity
 
145
  model = SentenceTransformer("redis/langcache-embed-v3")
146
  # Run inference
147
  sentences = [
148
+ "All For You was the third and last single of Kate Ryan 's third album `` Alive `` .",
149
+ 'All For You was the third and last single of the third album of Kate Ryan `` Alive `` .',
150
+ 'All For You was the third single of the third and last album `` Alive `` by Kate Ryan .',
151
  ]
152
  embeddings = model.encode(sentences)
153
  print(embeddings.shape)
 
156
  # Get the similarity scores for the embeddings
157
  similarities = model.similarity(embeddings, embeddings)
158
  print(similarities)
159
+ # tensor([[0.9961, 0.9922, 0.9961],
160
+ # [0.9922, 1.0000, 0.9922],
161
+ # [0.9961, 0.9922, 1.0078]], dtype=torch.bfloat16)
162
  ```
163
 
164
  <!--
 
223
 
224
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
225
  * Size: 109,885 training samples
226
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
227
  * Approximate statistics based on the first 1000 samples:
228
+ | | anchor | positive | negative |
229
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
230
+ | type | string | string | string |
231
+ | details | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 26.47 tokens</li><li>max: 61 tokens</li></ul> |
232
  * Samples:
233
+ | anchor | positive | negative |
234
+ |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
235
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>how can I get financial freedom as soon as possible?</code> |
236
+ | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The older Punts are still very much in existence today and race in the same fleets as the newer boats .</code> |
237
+ | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
238
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
239
  ```json
240
  {
 
250
 
251
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
252
  * Size: 109,885 evaluation samples
253
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
254
  * Approximate statistics based on the first 1000 samples:
255
+ | | anchor | positive | negative |
256
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
257
+ | type | string | string | string |
258
+ | details | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 26.47 tokens</li><li>max: 61 tokens</li></ul> |
259
  * Samples:
260
+ | anchor | positive | negative |
261
+ |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
262
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>how can I get financial freedom as soon as possible?</code> |
263
+ | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The older Punts are still very much in existence today and race in the same fleets as the newer boats .</code> |
264
+ | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
265
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
266
  ```json
267
  {