File size: 24,785 Bytes
771b80f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:499184
- loss:MultipleNegativesRankingLoss
base_model: jxm/cde-small-v2
widget:
- source_sentence: Heterozygous Advantage Definition
  sentences:
  - A heterozygote advantage (heterozygous advantage) describes the case in which
    the heterozygote genotype has a higher relative fitness than either the homozygote
    dominant or homozygote recessive genotype.
  - Science Main Index. Animals with an internal skeleton made of bone are called
    vertebrates. Vertebrates include fish, amphibians, reptiles, birds, mammals, primates,
    rodents and marsupials. Although vertebrates represent only a very small percentage
    of all animals, their size and mobility often allow them to dominate their environment.
  - 'By Regina Bailey. Definition: Heterozygous refers to having two different alleles
    for a single trait. Related Terms: Allele, Genes, Homozygous. Examples: The gene
    for seed shape in pea plants exists in two forms, one form or allele for round
    seed shape (R) and the other for wrinkled seed shape (r). heterozygous plant would
    contain the following alleles for seed shape: (Rr). Organisms have two alleles
    for each trait. When the alleles of a pair are heterozygous, one is dominant and
    the other is recessive. Using the previous example, round seed shape (R) is dominant
    and wrinkled seed shape (r) is recessive.'
- source_sentence: definition of annul
  sentences:
  - "When a celebrity wakes up in Las Vegas with a mysterious wedding ring on her\
    \ finger, the first thing sheâ\x80\x99ll probably want to do is annul the marriage.\
    \ That will declare it invalid and officially cancel the whole deal. Annul, which\
    \ means â\x80\x9Cto cancelâ\x80\x9D or â\x80\x9Cto invalidate,â\x80\x9D is usually\
    \ used in the context of politics or marriage. New government officials often\
    \ want to annul laws and policies of the previous post-holder, effectively reversing\
    \ their work. When you annul a marriage, you are officially declaring it invalid,\
    \ as if it never happened."
  - 'The proper term for Catholic annulment is declaration of nullity: the Church
    declares that the marriage never was valid in the first place. This becomes clearer
    when we compare Catholic annulment to civil divorce. A divorce is effective as
    of the date of the divorce decree.Before that, the couple was still married.nnulment
    for an invalid marriage Catholic annulment means that a couple was never married
    in the sacramental sense. God did not create that unbreakable bond between them
    because the sacrament of marriage was not actually fulfilled. The term annulment
    is actually a little misleading.'
  - Another word for consistent word list. Below are a number of words whose meaning
    is similar to consistent. 1  accordant. 2  compatible. 3  conformable. 4  congruous.
    5  harmonious. 6  suitable. 7  uniform.
- source_sentence: how much do peds nurse make
  sentences:
  - Vyvanse is detectable in urine up to 3 days after ingesting Vyvanse. Vyvanse is
    detectable in hair samples for months after ingestion. Though Vyvanse itself only
    stays in your system four hours post-ingestion, the active drug d-amphetamine
    stays in your system for 40 hours.
  - A newly practicing pediatric nurse in the US receives a beginning yearly salary
    of around $31,311 but as he/she gains experience, he/she can anticipate a yearly
    income of up to $81,840. The national hourly rate for Pediatric Nurse is from
    between $15.53 to $35.81 with an average overtime pay of $6.93 to $54.59 per hour.
  - 'Rad Tech Salary: $64,450 a year. Average pay for rad techs is $64,450 per annum,
    which is 35% higher than the US median income. A radiographer makes an average
    of $5,371 per month; $1,239 a week and $30.99 an hour. radiology technologist
    can make more than $87,160 a year depending on many factors like work place, education,
    experience, performance, etc. Working at schools ($74,810) or specialty hospitals
    ($72,410) would help you make more money than other industries. Massachusetts
    is one of the best state based on annual income.'
- source_sentence: cost of six sigma certification
  sentences:
  - "The Roosevelt Corollary was an addition to the Monroe Doctrine which stated that\
    \ no European countries were allowed to intervene with Latin American affairs.\
    \ The only way that â\x80¦ the U.S was allowed to become involved was if the affairs\
    \ or European countries was threatened."
  - 1 The cost of the certification exams varies per training center, so you still
    need to contact the center nearest you to get the actual price. 2  However, if
    we look at the centers that have published their exam rates, we found that the
    average cost of the exam is between $130 and $170. The costs of these training
    programs could cost anywehre from $1,500 to more than $2,500. 2  For example,
    a training course for AutoCAD being offered by Delta.edu costs $2,595.
  - You can buy this ExpertRating Online Six Sigma Green Belt Certification. leading
    to Certification at a special offer price of only $99.99 which includes the in-depth
    ExpertRating Online Six Sigma Green Belt Courseware and exam fee. The ExpertRating
    Six Sigma Green Belt Certification is by far the best value for money Six Sigma
    Green Belt Certification at $99.99. Worldwide airmail delivery of the hard copy
    Six Sigma Green Belt certificate. The certificate can be used to prove your certified
    status and does not mention the word online.
- source_sentence: when did jeepers creepers come out
  sentences:
  - Jeepers Creepers Wiki. Creeper. Creeper is a fictional character and the main
    antagonist in the 2001 horror film Jeepers Creepers and its 2003 sequel Jeepers
    Creepers II. It is an ancient, mysterious demon who viciously feeds on the flesh
    and bones of many human beings for 23 days every 23rd spring.
  - Moline, IL,sales tax rate is 7.25%, and the Income tax is 8.92%.
  - ' Creep  is a song by the English alternative rock band Radiohead. Radiohead released
    Creep as their debut single in 1992, and it later appeared on their first album,
    Pablo Honey (1993). During its initial release, Creep was not a chart success.'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# SentenceTransformer based on jxm/cde-small-v2

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jxm/cde-small-v2](https://huggingface.co/jxm/cde-small-v2). It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [jxm/cde-small-v2](https://huggingface.co/jxm/cde-small-v2) <!-- at revision 287bf0ea6ebfecf2339762d0ef28fb846959a8f2 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** None dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({}) with Transformer model: ContextualDocumentEmbeddingTransformer 
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BlackBeenie/cde-small-v2-biencoder-msmarco")
# Run inference
sentences = [
    'when did jeepers creepers come out',
    'Jeepers Creepers Wiki. Creeper. Creeper is a fictional character and the main antagonist in the 2001 horror film Jeepers Creepers and its 2003 sequel Jeepers Creepers II. It is an ancient, mysterious demon who viciously feeds on the flesh and bones of many human beings for 23 days every 23rd spring.',
    ' Creep  is a song by the English alternative rock band Radiohead. Radiohead released Creep as their debut single in 1992, and it later appeared on their first album, Pablo Honey (1993). During its initial release, Creep was not a chart success.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 499,184 training samples
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
* Approximate statistics based on the first 1000 samples:
  |         | sentence_0                                                                       | sentence_1                                                                          | sentence_2                                                                          |
  |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
  | type    | string                                                                           | string                                                                              | string                                                                              |
  | details | <ul><li>min: 4 tokens</li><li>mean: 9.26 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 81.55 tokens</li><li>max: 203 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 80.95 tokens</li><li>max: 231 tokens</li></ul> |
* Samples:
  | sentence_0                                                              | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                   | sentence_2                                                                                                                                                                                                                                                                                                                                                                                                                                               |
  |:------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>what year did the sandy hook incident happen</code>               | <code>For Newtown, 2012 Sandy Hook Elementary School shooting is still painful. It's been three years since the terrible day Jimmy Greene’s 6-year-old daughter, Ana Grace Marquez, and 19 other children were murdered in the mass shooting at Sandy Hook Elementary School. But life without Ana, who loved to sing and dance from room to room, continues to be so hard that, in some ways, Dec. 14 is no tougher than any other day for Greene.</code> | <code>Hook is a 1991 Steven Spielberg film starring Dustin Hoffman and Robin Williams. The film's storyline is based on the books written by Sir James Matthew Barrie in 1904 or 1905 and is the sequel to the first book.</code>                                                                                                                                                                                                                        |
  | <code>what kind of degree do you need to be a medical assistant?</code> | <code>If you choose this path, here is what you need to do: 1  Have a high school diploma or GED. The minimum educational requirement for medical assistants is a high school diploma or equivalency degree. 2  Find a doctor who will provide training.</code>                                                                                                                                                                                              | <code>Many colleges offer two-year associate's degrees or one-year certificate programs in different areas of medical office technology. Certificate areas include billing specialist, medical administrative assistant, and medical transcriptionist. Because of the complexity of medical jargon and operational procedures, many employers prefer these professionals to hold related two-year degrees or complete one-year training programs.</code> |
  | <code>what does usb cord do</code>                                      | <code>The Flash Player is required to see this video. The term USB stands for Universal Serial Bus. USB cable assemblies are some of the most popular cable types available, used mostly to connect computers to peripheral devices such as cameras, camcorders, printers, scanners, and more. Devices manufactured to the current USB Revision 3.0 specification are backward compatible with version 1.1.</code>                                           | <code>The USB 2.0 specification for a Full-Speed/High-Speed cable calls for four wires, two for data and two for power, and a braided outer shield. The USB 3.0 specification calls for a total of 10 wires plus a braided outer shield. Two wires are used for power.</code>                                                                                                                                                                            |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `fp16`: True
- `multi_dataset_batch_sampler`: round_robin

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 3
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin

</details>

### Training Logs
| Epoch  | Step  | Training Loss |
|:------:|:-----:|:-------------:|
| 0.0321 | 500   | 0.9856        |
| 0.0641 | 1000  | 0.4499        |
| 0.0962 | 1500  | 0.3673        |
| 0.1282 | 2000  | 0.339         |
| 0.1603 | 2500  | 0.3118        |
| 0.1923 | 3000  | 0.2929        |
| 0.2244 | 3500  | 0.2886        |
| 0.2564 | 4000  | 0.2771        |
| 0.2885 | 4500  | 0.2762        |
| 0.3205 | 5000  | 0.2716        |
| 0.3526 | 5500  | 0.2585        |
| 0.3846 | 6000  | 0.2631        |
| 0.4167 | 6500  | 0.2458        |
| 0.4487 | 7000  | 0.2496        |
| 0.4808 | 7500  | 0.252         |
| 0.5128 | 8000  | 0.2399        |
| 0.5449 | 8500  | 0.2422        |
| 0.5769 | 9000  | 0.2461        |
| 0.6090 | 9500  | 0.2314        |
| 0.6410 | 10000 | 0.2331        |
| 0.6731 | 10500 | 0.2314        |
| 0.7051 | 11000 | 0.2302        |
| 0.7372 | 11500 | 0.235         |
| 0.7692 | 12000 | 0.2176        |
| 0.8013 | 12500 | 0.2201        |
| 0.8333 | 13000 | 0.2206        |
| 0.8654 | 13500 | 0.222         |
| 0.8974 | 14000 | 0.2136        |
| 0.9295 | 14500 | 0.2108        |
| 0.9615 | 15000 | 0.2102        |
| 0.9936 | 15500 | 0.2098        |
| 1.0256 | 16000 | 0.1209        |
| 1.0577 | 16500 | 0.099         |
| 1.0897 | 17000 | 0.0944        |
| 1.1218 | 17500 | 0.0955        |
| 1.1538 | 18000 | 0.0947        |
| 1.1859 | 18500 | 0.0953        |
| 1.2179 | 19000 | 0.0943        |
| 1.25   | 19500 | 0.0911        |
| 1.2821 | 20000 | 0.0964        |
| 1.3141 | 20500 | 0.0933        |
| 1.3462 | 21000 | 0.0956        |
| 1.3782 | 21500 | 0.0941        |
| 1.4103 | 22000 | 0.0903        |
| 1.4423 | 22500 | 0.0889        |
| 1.4744 | 23000 | 0.0919        |
| 1.5064 | 23500 | 0.0917        |
| 1.5385 | 24000 | 0.0956        |
| 1.5705 | 24500 | 0.0903        |
| 1.6026 | 25000 | 0.0931        |
| 1.6346 | 25500 | 0.0931        |
| 1.6667 | 26000 | 0.089         |
| 1.6987 | 26500 | 0.0892        |
| 1.7308 | 27000 | 0.091         |
| 1.7628 | 27500 | 0.0892        |
| 1.7949 | 28000 | 0.0884        |
| 1.8269 | 28500 | 0.0889        |
| 1.8590 | 29000 | 0.0877        |
| 1.8910 | 29500 | 0.0866        |
| 1.9231 | 30000 | 0.0853        |
| 1.9551 | 30500 | 0.085         |
| 1.9872 | 31000 | 0.0867        |
| 2.0192 | 31500 | 0.055         |
| 2.0513 | 32000 | 0.0338        |
| 2.0833 | 32500 | 0.033         |
| 2.1154 | 33000 | 0.033         |
| 2.1474 | 33500 | 0.0317        |
| 2.1795 | 34000 | 0.0323        |
| 2.2115 | 34500 | 0.0322        |
| 2.2436 | 35000 | 0.0316        |
| 2.2756 | 35500 | 0.0314        |
| 2.3077 | 36000 | 0.0312        |
| 2.3397 | 36500 | 0.0324        |
| 2.3718 | 37000 | 0.0324        |
| 2.4038 | 37500 | 0.0328        |
| 2.4359 | 38000 | 0.0311        |
| 2.4679 | 38500 | 0.0312        |
| 2.5    | 39000 | 0.0312        |
| 2.5321 | 39500 | 0.0311        |
| 2.5641 | 40000 | 0.0315        |
| 2.5962 | 40500 | 0.0308        |
| 2.6282 | 41000 | 0.0308        |
| 2.6603 | 41500 | 0.0306        |
| 2.6923 | 42000 | 0.0313        |
| 2.7244 | 42500 | 0.0322        |
| 2.7564 | 43000 | 0.0315        |
| 2.7885 | 43500 | 0.0311        |
| 2.8205 | 44000 | 0.0321        |
| 2.8526 | 44500 | 0.0318        |
| 2.8846 | 45000 | 0.0305        |
| 2.9167 | 45500 | 0.031         |
| 2.9487 | 46000 | 0.032         |
| 2.9808 | 46500 | 0.0306        |


### Framework Versions
- Python: 3.11.12
- Sentence Transformers: 3.4.1
- Transformers: 4.50.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->