radoslavralev commited on
Commit
1068642
·
verified ·
1 Parent(s): a968d6e

Add new SentenceTransformer model

Browse files
Files changed (2) hide show
  1. README.md +65 -68
  2. model.safetensors +1 -1
README.md CHANGED
@@ -16,53 +16,50 @@ tags:
16
  - loss:CoSENTLoss
17
  base_model: Alibaba-NLP/gte-modernbert-base
18
  widget:
19
- - source_sentence: 'See Precambrian time scale # Proposed Geologic timeline for another
20
- set of periods 4600 -- 541 MYA .'
21
  sentences:
22
- - In 2014 election , Biju Janata Dal candidate Tathagat Satapathy Bharatiya Janata
23
- party candidate Rudra Narayan Pany defeated with a margin of 1.37,340 votes .
24
- - In Scotland , the Strathclyde Partnership for Transport , formerly known as Strathclyde
25
- Passenger Transport Executive , comprises the former Strathclyde region , which
26
- includes the urban area around Glasgow .
27
- - 'See Precambrian Time Scale # Proposed Geological Timeline for another set of
28
- periods of 4600 -- 541 MYA .'
29
- - source_sentence: It is also 5 kilometers northeast of Tamaqua , 27 miles south of
30
- Allentown and 9 miles northwest of Hazleton .
31
  sentences:
32
- - In 1948 he moved to Massachusetts , and eventually settled in Vermont .
33
- - Suddenly I remembered that I was a New Zealander , I caught the first plane home
34
- and came back .
35
- - It is also 5 miles northeast of Tamaqua , 27 miles south of Allentown , and 9
36
- miles northwest of Hazleton .
37
- - source_sentence: The party has a Member of Parliament , a member of the House of
38
- Lords , three members of the London Assembly and two Members of the European Parliament
39
- .
40
  sentences:
41
- - The party has one Member of Parliament , one member of the House of Lords , three
42
- Members of the London Assembly and two Members of the European Parliament .
43
- - Grapsid crabs dominate in Australia , Malaysia and Panama , while gastropods Cerithidea
44
- scalariformis and Melampus coeffeus are important seed predators in Florida mangroves
45
- .
46
- - Music Story is a music service website and international music data provider that
47
- curates , aggregates and analyses metadata for digital music services .
48
- - source_sentence: 'The play received two 1969 Tony Award nominations : Best Actress
49
- in a Play ( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .'
50
  sentences:
51
- - Ravishanker is a fellow of the International Statistical Institute and an elected
52
- member of the American Statistical Association .
53
- - 'In 1969 , the play received two Tony - Award nominations : Best Actress in a
54
- Theatre Play ( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .'
55
- - AMD and Nvidia both have proprietary methods of scaling , CrossFireX for AMD ,
56
- and SLI for Nvidia .
57
- - source_sentence: He was a close friend of Ángel Cabrera and is a cousin of golfer
58
- Tony Croatto .
59
  sentences:
60
- - He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto
61
- .
62
- - Eugenijus Bartulis ( born December 7 , 1949 in Kaunas ) is a Lithuanian Roman
63
- Catholic priest , and Bishop of Šiauliai .
64
- - UWIRE also distributes its members content to professional media outlets , including
65
- Yahoo , CNN and CBS News .
66
  datasets:
67
  - redis/langcache-sentencepairs-v1
68
  pipeline_tag: sentence-similarity
@@ -154,9 +151,9 @@ from sentence_transformers import SentenceTransformer
154
  model = SentenceTransformer("redis/langcache-embed-v3")
155
  # Run inference
156
  sentences = [
157
- 'He was a close friend of Ángel Cabrera and is a cousin of golfer Tony Croatto .',
158
- 'He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto .',
159
- 'UWIRE also distributes its members content to professional media outlets , including Yahoo , CNN and CBS News .',
160
  ]
161
  embeddings = model.encode(sentences)
162
  print(embeddings.shape)
@@ -165,9 +162,9 @@ print(embeddings.shape)
165
  # Get the similarity scores for the embeddings
166
  similarities = model.similarity(embeddings, embeddings)
167
  print(similarities)
168
- # tensor([[0.9922, 0.9922, 0.5352],
169
- # [0.9922, 0.9961, 0.5391],
170
- # [0.5352, 0.5391, 1.0000]], dtype=torch.bfloat16)
171
  ```
172
 
173
  <!--
@@ -231,19 +228,19 @@ You can finetune this model on your own dataset.
231
  #### LangCache Sentence Pairs (all)
232
 
233
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
234
- * Size: 26,850 training samples
235
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
236
  * Approximate statistics based on the first 1000 samples:
237
- | | sentence1 | sentence2 | label |
238
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
239
- | type | string | string | int |
240
- | details | <ul><li>min: 8 tokens</li><li>mean: 27.35 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
241
  * Samples:
242
- | sentence1 | sentence2 | label |
243
- |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|:---------------|
244
- | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>1</code> |
245
- | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
246
- | <code>The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .</code> | <code>The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .</code> | <code>1</code> |
247
  * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
248
  ```json
249
  {
@@ -257,19 +254,19 @@ You can finetune this model on your own dataset.
257
  #### LangCache Sentence Pairs (all)
258
 
259
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
260
- * Size: 26,850 evaluation samples
261
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
262
  * Approximate statistics based on the first 1000 samples:
263
- | | sentence1 | sentence2 | label |
264
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
265
- | type | string | string | int |
266
- | details | <ul><li>min: 8 tokens</li><li>mean: 27.35 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
267
  * Samples:
268
- | sentence1 | sentence2 | label |
269
- |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|:---------------|
270
- | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>1</code> |
271
- | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
272
- | <code>The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .</code> | <code>The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .</code> | <code>1</code> |
273
  * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
274
  ```json
275
  {
 
16
  - loss:CoSENTLoss
17
  base_model: Alibaba-NLP/gte-modernbert-base
18
  widget:
19
+ - source_sentence: In 2015 Adolf Hitler appeared in the kickstarter short movie ``
20
+ Kung Fury `` as Taccone ( A.K.A .
21
  sentences:
22
+ - In 2015 , Adolf Hitler appeared in the Kickstarter - short film `` Kung Fury ``
23
+ as Taccone ( A.K.A .
24
+ - In 1795 , the only white residents were Dr. John Laidley and two brothers with
25
+ the surname Ainslie .
26
+ - The 125th University Match was played in March 2014 at the Rye Golf Club , Oxford
27
+ , East Sussex won the game 8.5 - 6.5 .
28
+ - source_sentence: From 1973 to 1974 , Aubrey toured with the Cambridge Theatre Company
29
+ as Diggory in `` She Stoops to Conquer `` and again as Aguecheek .
 
30
  sentences:
31
+ - Oxide can be reduced to metallic samarium at higher temperatures by heating with
32
+ a reducing agent such as hydrogen or carbon monoxide .
33
+ - From 1973 to 1974 Aguecheek toured with the Cambridge Theatre Company as Diggory
34
+ in `` You Stoops to Conquer `` and again as Aubrey .
35
+ - The medals were presented by Barry Maister , IOC member , New Zealand and Sarah
36
+ Webb Gosling , Vice President of World Sailing .
37
+ - source_sentence: There is no official wall on the border , although there are sections
38
+ of fence near populated areas and continuous border crossings .
39
  sentences:
40
+ - The 2014 -- 15 Boston Bruins season was the 91st season for the National Hockey
41
+ League franchise that was established on November 1 , 1924 .
42
+ - He was trained by the Inghams and owned by John Hawkes .
43
+ - There is no continuous wall on the border , although there are fence sections
44
+ near populated areas and official border crossings .
45
+ - source_sentence: Capital . `` The French established similar hill stations in Indochina
46
+ , such as Dalat built in 1921 .
 
 
47
  sentences:
48
+ - Lubuk China is a small town in Alor Gajah District , Melaka , Malaysia . It is
49
+ situated near the border with Negeri Sembilan .
50
+ - The French established similar hill stations in Indochina , such as Dalat , built
51
+ in 1921 .
52
+ - John Potts ( or Pott ) was a doctor and colonial governor of Virginia in the Jamestown
53
+ settlement at Virginia Colony in the early 17th century .
54
+ - source_sentence: The band pursued `` signals `` in January 2012 in three weeks ,
55
+ and drums were recorded in a day and a half .
56
  sentences:
57
+ - It was repaired at the beginning of the 20th century and is listed as closed in
58
+ our records .
59
+ - The band tracked `` Signals `` in three weeks in January 2012 . Drums were recorded
60
+ in a day and a half .
61
+ - Contributors include actor Anton LaVey , Satanist Christopher Lee , serial killer
62
+ expert Clive Barker , author Karen Greenlee , and necrophile Robert Ressler .
63
  datasets:
64
  - redis/langcache-sentencepairs-v1
65
  pipeline_tag: sentence-similarity
 
151
  model = SentenceTransformer("redis/langcache-embed-v3")
152
  # Run inference
153
  sentences = [
154
+ 'The band pursued `` signals `` in January 2012 in three weeks , and drums were recorded in a day and a half .',
155
+ 'The band tracked `` Signals `` in three weeks in January 2012 . Drums were recorded in a day and a half .',
156
+ 'Contributors include actor Anton LaVey , Satanist Christopher Lee , serial killer expert Clive Barker , author Karen Greenlee , and necrophile Robert Ressler .',
157
  ]
158
  embeddings = model.encode(sentences)
159
  print(embeddings.shape)
 
162
  # Get the similarity scores for the embeddings
163
  similarities = model.similarity(embeddings, embeddings)
164
  print(similarities)
165
+ # tensor([[0.9961, 0.9570, 0.4941],
166
+ # [0.9570, 0.9961, 0.5078],
167
+ # [0.4941, 0.5078, 1.0000]], dtype=torch.bfloat16)
168
  ```
169
 
170
  <!--
 
228
  #### LangCache Sentence Pairs (all)
229
 
230
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
231
+ * Size: 62,021 training samples
232
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
233
  * Approximate statistics based on the first 1000 samples:
234
+ | | sentence1 | sentence2 | label |
235
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
236
+ | type | string | string | int |
237
+ | details | <ul><li>min: 8 tokens</li><li>mean: 27.46 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 27.36 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>0: ~50.30%</li><li>1: ~49.70%</li></ul> |
238
  * Samples:
239
+ | sentence1 | sentence2 | label |
240
+ |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
241
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>1</code> |
242
+ | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> | <code>0</code> |
243
+ | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
244
  * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
245
  ```json
246
  {
 
254
  #### LangCache Sentence Pairs (all)
255
 
256
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
257
+ * Size: 62,021 evaluation samples
258
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
259
  * Approximate statistics based on the first 1000 samples:
260
+ | | sentence1 | sentence2 | label |
261
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
262
+ | type | string | string | int |
263
+ | details | <ul><li>min: 8 tokens</li><li>mean: 27.46 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 27.36 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>0: ~50.30%</li><li>1: ~49.70%</li></ul> |
264
  * Samples:
265
+ | sentence1 | sentence2 | label |
266
+ |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
267
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>1</code> |
268
+ | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> | <code>0</code> |
269
+ | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
270
  * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
271
  ```json
272
  {
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9a2c384340b361720a3f76501efa901aac7dbd4ce2d2640d36a49d0897917139
3
  size 298041696
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95d02211c4cca89113f9f3e93ed91f5176bf50170faa2cb835f7bfea15bb9dd2
3
  size 298041696