radoslavralev commited on
Commit
6f339ce
·
verified ·
1 Parent(s): bf31766

Add new SentenceTransformer model

Browse files
Files changed (2) hide show
  1. README.md +65 -76
  2. model.safetensors +1 -1
README.md CHANGED
@@ -12,51 +12,54 @@ tags:
12
  - retrieval
13
  - reranking
14
  - generated_from_trainer
15
- - dataset_size:3119809
16
- - loss:AdaFaceInBatchLoss
17
  base_model: Alibaba-NLP/gte-modernbert-base
18
  widget:
19
- - source_sentence: Hayley Vaughan portrayed Ripa on the ABC daytime soap opera , ``
20
- All My Children `` , between 1990 and 2002 .
 
21
  sentences:
22
- - Traxxpad is a music application for Sony 's PlayStation Portable published by
23
- Definitive Studios and developed by Eidos Interactive .
24
- - Between 1990 and 2002 , Hayley Vaughan Ripa portrayed in the ABC soap opera ``
25
- All My Children `` .
26
- - Between 1990 and 2002 , Ripa Hayley portrayed Vaughan in the ABC soap opera ``
27
- All My Children `` .
28
- - source_sentence: Olivella monilifera is a species of dwarf sea snail , small gastropod
29
- mollusk in the family Olivellidae , the marine olives .
 
30
  sentences:
31
- - Olivella monilifera is a species of the dwarf - sea snail , small gastropod mollusk
32
- in the Olivellidae family , the marine olives .
33
- - He was cut by the Browns after being signed by the Bills in 2013 . He was later
34
- released .
35
- - Olivella monilifera is a kind of sea snail , marine gastropod mollusk in the Olivellidae
36
- family , the dwarf olives .
37
- - source_sentence: Hayashi said that Mackey `` is a sort of `` of the original model
38
- for Tenchi .
39
  sentences:
40
- - In the summer of 2009 , Ellick shot a documentary about Malala Yousafzai .
41
- - Hayashi said that Mackey is `` sort of `` the original model for Tenchi .
42
- - Mackey said that Hayashi is `` sort of `` the original model for Tenchi .
43
- - source_sentence: Much of the film was shot on location in Los Angeles and in nearby
44
- Burbank and Glendale .
 
 
45
  sentences:
46
- - Much of the film was shot on location in Los Angeles and in nearby Burbank and
47
- Glendale .
48
- - Much of the film was shot on site in Burbank and Glendale and in the nearby Los
49
- Angeles .
50
- - Traxxpad is a music application for the Sony PlayStation Portable developed by
51
- the Definitive Studios and published by Eidos Interactive .
52
- - source_sentence: According to him , the earth is the carrier of his artistic work
53
- , which is only integrated into the creative process by minimal changes .
54
  sentences:
55
- - National players are Bold players .
56
- - According to him , earth is the carrier of his artistic work being integrated
57
- into the creative process only by minimal changes .
58
- - According to him , earth is the carrier of his creative work being integrated
59
- into the artistic process only by minimal changes .
60
  datasets:
61
  - redis/langcache-sentencepairs-v2
62
  pipeline_tag: sentence-similarity
@@ -148,9 +151,9 @@ from sentence_transformers import SentenceTransformer
148
  model = SentenceTransformer("redis/langcache-embed-v3")
149
  # Run inference
150
  sentences = [
151
- 'According to him , the earth is the carrier of his artistic work , which is only integrated into the creative process by minimal changes .',
152
- 'According to him , earth is the carrier of his artistic work being integrated into the creative process only by minimal changes .',
153
- 'According to him , earth is the carrier of his creative work being integrated into the artistic process only by minimal changes .',
154
  ]
155
  embeddings = model.encode(sentences)
156
  print(embeddings.shape)
@@ -159,9 +162,9 @@ print(embeddings.shape)
159
  # Get the similarity scores for the embeddings
160
  similarities = model.similarity(embeddings, embeddings)
161
  print(similarities)
162
- # tensor([[1.0000, 0.9961, 0.9922],
163
- # [0.9961, 1.0000, 0.9961],
164
- # [0.9922, 0.9961, 0.9961]], dtype=torch.bfloat16)
165
  ```
166
 
167
  <!--
@@ -225,54 +228,40 @@ You can finetune this model on your own dataset.
225
  #### LangCache Sentence Pairs (all)
226
 
227
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
228
- * Size: 126,938 training samples
229
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
230
- * Approximate statistics based on the first 1000 samples:
231
  | | anchor | positive | negative |
232
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
233
  | type | string | string | string |
234
- | details | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 26.54 tokens</li><li>max: 61 tokens</li></ul> |
235
  * Samples:
236
- | anchor | positive | negative |
237
- |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
238
- | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>how can I get financial freedom as soon as possible?</code> |
239
- | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The older Punts are still very much in existence today and race in the same fleets as the newer boats .</code> |
240
- | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
241
- * Loss: <code>losses.AdaFaceInBatchLoss</code> with these parameters:
242
- ```json
243
- {
244
- "scale": 20.0,
245
- "similarity_fct": "cos_sim",
246
- "gather_across_devices": false
247
- }
248
- ```
249
 
250
  ### Evaluation Dataset
251
 
252
  #### LangCache Sentence Pairs (all)
253
 
254
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
255
- * Size: 126,938 evaluation samples
256
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
257
- * Approximate statistics based on the first 1000 samples:
258
  | | anchor | positive | negative |
259
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
260
  | type | string | string | string |
261
- | details | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 26.54 tokens</li><li>max: 61 tokens</li></ul> |
262
  * Samples:
263
- | anchor | positive | negative |
264
- |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
265
- | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>how can I get financial freedom as soon as possible?</code> |
266
- | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The older Punts are still very much in existence today and race in the same fleets as the newer boats .</code> |
267
- | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
268
- * Loss: <code>losses.AdaFaceInBatchLoss</code> with these parameters:
269
- ```json
270
- {
271
- "scale": 20.0,
272
- "similarity_fct": "cos_sim",
273
- "gather_across_devices": false
274
- }
275
- ```
276
 
277
  ### Training Logs
278
  | Epoch | Step | test_cosine_ndcg@10 |
 
12
  - retrieval
13
  - reranking
14
  - generated_from_trainer
15
+ - dataset_size:400
16
+ - loss:CustomBCELoss
17
  base_model: Alibaba-NLP/gte-modernbert-base
18
  widget:
19
+ - source_sentence: The aversive or evitative case ( abbreviated ) is a grammatical
20
+ case that is found in Australian Aboriginal languages and indicates that the marked
21
+ noun is avoided or feared .
22
  sentences:
23
+ - The aversive or evitative case ( abbreviated ) is a grammatical case that is found
24
+ in Australian Aboriginal languages and indicates that the marked noun is avoided
25
+ or feared .
26
+ - He was born in Ryno , Johannesburg , died in North West .
27
+ - The aversive or evitative case ( abbreviated ) is a marked case found in Australian
28
+ Aboriginal languages that indicates that the grammatical noun is avoided or feared
29
+ .
30
+ - source_sentence: Three ships of the Royal Australian Navy ( RAN ) were named after
31
+ Perth , the capital city of Western Australia , as HMAS `` Perth `` .
32
  sentences:
33
+ - Three ships of the Royal Australian Navy ( RAN ) have been named HMAS `` Western
34
+ Australia `` after Perth , the capital city of Perth .
35
+ - Three ships of the Royal Australian Navy ( RAN ) were named after Perth , the
36
+ capital city of Western Australia , as HMAS `` Perth `` .
37
+ - He lost the title to Rees after Iestyn Rees purchased his title shot at PWE Jingle
38
+ All The Galloway .
39
+ - source_sentence: Oxynoe azuropunctata is a kind of small sea snail or sea snail
40
+ , a bubble snail , a marine gastropod mollusk in the Oxynoidae family .
41
  sentences:
42
+ - It is located at Ellison Bay , in the town of Liberty Grove , Wisconsin .
43
+ - Oxynoe azuropunctata is a kind of small sea snail or sea snail , a bubble snail
44
+ , a marine gastropod mollusk in the Oxynoidae family .
45
+ - Oxynoe azuropunctata is a species of marine sea snail or sea slug , a bubble snail
46
+ , a small gastropod mollusk in the family Oxynoidae .
47
+ - source_sentence: It included the original six tracks , re-worked with new vocals
48
+ and live drums , three remixes , and new two tracks .
49
  sentences:
50
+ - It included the original six tracks , overhauled with new vocals and live drums
51
+ , three remixes and two new tracks .
52
+ - Punish Lichfield , garrison Birmingham , and clear the country as far as possible
53
+ .
54
+ - Antoninus , or known as Antoninus , was a Roman who lived in the 1st century .
55
+ - source_sentence: It is known from Australia , including South Australia , Tasmania
56
+ , Queensland , New South Wales and Victoria .
 
57
  sentences:
58
+ - It is famous from Australia , including South Australia , Tasmania , Queensland
59
+ , New South Wales and Victoria .
60
+ - In 1995 Franz married Joanie Zeck , whom he met in 1982 .
61
+ - In 1792 , the family moved to Kingston in Toronto and then York ( later renamed
62
+ Upper Canada ) .
63
  datasets:
64
  - redis/langcache-sentencepairs-v2
65
  pipeline_tag: sentence-similarity
 
151
  model = SentenceTransformer("redis/langcache-embed-v3")
152
  # Run inference
153
  sentences = [
154
+ 'It is known from Australia , including South Australia , Tasmania , Queensland , New South Wales and Victoria .',
155
+ 'It is famous from Australia , including South Australia , Tasmania , Queensland , New South Wales and Victoria .',
156
+ 'In 1792 , the family moved to Kingston in Toronto and then York ( later renamed Upper Canada ) .',
157
  ]
158
  embeddings = model.encode(sentences)
159
  print(embeddings.shape)
 
162
  # Get the similarity scores for the embeddings
163
  similarities = model.similarity(embeddings, embeddings)
164
  print(similarities)
165
+ # tensor([[1.0078, 0.9531, 0.5898],
166
+ # [0.9531, 0.9961, 0.5898],
167
+ # [0.5898, 0.5898, 0.9922]], dtype=torch.bfloat16)
168
  ```
169
 
170
  <!--
 
228
  #### LangCache Sentence Pairs (all)
229
 
230
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
231
+ * Size: 199 training samples
232
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
233
+ * Approximate statistics based on the first 199 samples:
234
  | | anchor | positive | negative |
235
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
236
  | type | string | string | string |
237
+ | details | <ul><li>min: 9 tokens</li><li>mean: 26.47 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 26.47 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 25.93 tokens</li><li>max: 42 tokens</li></ul> |
238
  * Samples:
239
+ | anchor | positive | negative |
240
+ |:--------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------|
241
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>He was at the Westminster School under Richard Busby and studied at Christ Church , Oxford with Henry Aldrich .</code> |
242
+ | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>Richard Cheyne was the son and heir of Robert Cralle of Shurland and Margery , daughter and coheiress of Cheyne of Cralle , Sussex .</code> |
243
+ | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
244
+ * Loss: <code>losses.CustomBCELoss</code>
 
 
 
 
 
 
 
245
 
246
  ### Evaluation Dataset
247
 
248
  #### LangCache Sentence Pairs (all)
249
 
250
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
251
+ * Size: 199 evaluation samples
252
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
253
+ * Approximate statistics based on the first 199 samples:
254
  | | anchor | positive | negative |
255
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
256
  | type | string | string | string |
257
+ | details | <ul><li>min: 9 tokens</li><li>mean: 26.47 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 26.47 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 25.93 tokens</li><li>max: 42 tokens</li></ul> |
258
  * Samples:
259
+ | anchor | positive | negative |
260
+ |:--------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------|
261
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>He was at the Westminster School under Richard Busby and studied at Christ Church , Oxford with Henry Aldrich .</code> |
262
+ | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>Richard Cheyne was the son and heir of Robert Cralle of Shurland and Margery , daughter and coheiress of Cheyne of Cralle , Sussex .</code> |
263
+ | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
264
+ * Loss: <code>losses.CustomBCELoss</code>
 
 
 
 
 
 
 
265
 
266
  ### Training Logs
267
  | Epoch | Step | test_cosine_ndcg@10 |
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:975841a27a72813ef45a69724c95df0c0fc6f8cc33484c016d0ae75202515551
3
  size 298041696
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95d02211c4cca89113f9f3e93ed91f5176bf50170faa2cb835f7bfea15bb9dd2
3
  size 298041696