maennyn commited on
Commit
b482253
·
verified ·
1 Parent(s): 473b49c

Add new CrossEncoder model

Browse files
Files changed (7) hide show
  1. README.md +474 -0
  2. config.json +35 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +37 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +58 -0
  7. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,474 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - cross-encoder
8
+ - generated_from_trainer
9
+ - dataset_size:36728
10
+ - loss:BinaryCrossEntropyLoss
11
+ base_model: cross-encoder/ms-marco-MiniLM-L6-v2
12
+ pipeline_tag: text-ranking
13
+ library_name: sentence-transformers
14
+ metrics:
15
+ - pearson
16
+ - spearman
17
+ - map
18
+ - mrr@10
19
+ - ndcg@10
20
+ model-index:
21
+ - name: ms-marco-MiniLM-L-6-v2 Finetuned on PV211 HomeWork
22
+ results:
23
+ - task:
24
+ type: cross-encoder-correlation
25
+ name: Cross Encoder Correlation
26
+ dataset:
27
+ name: sts dev
28
+ type: sts_dev
29
+ metrics:
30
+ - type: pearson
31
+ value: 0.8857946136871967
32
+ name: Pearson
33
+ - type: spearman
34
+ value: 0.8182465826410324
35
+ name: Spearman
36
+ - task:
37
+ type: cross-encoder-reranking
38
+ name: Cross Encoder Reranking
39
+ dataset:
40
+ name: NanoMSMARCO R100
41
+ type: NanoMSMARCO_R100
42
+ metrics:
43
+ - type: map
44
+ value: 0.6048
45
+ name: Map
46
+ - type: mrr@10
47
+ value: 0.5974
48
+ name: Mrr@10
49
+ - type: ndcg@10
50
+ value: 0.6644
51
+ name: Ndcg@10
52
+ - task:
53
+ type: cross-encoder-reranking
54
+ name: Cross Encoder Reranking
55
+ dataset:
56
+ name: NanoNFCorpus R100
57
+ type: NanoNFCorpus_R100
58
+ metrics:
59
+ - type: map
60
+ value: 0.3633
61
+ name: Map
62
+ - type: mrr@10
63
+ value: 0.5961
64
+ name: Mrr@10
65
+ - type: ndcg@10
66
+ value: 0.4082
67
+ name: Ndcg@10
68
+ - task:
69
+ type: cross-encoder-reranking
70
+ name: Cross Encoder Reranking
71
+ dataset:
72
+ name: NanoNQ R100
73
+ type: NanoNQ_R100
74
+ metrics:
75
+ - type: map
76
+ value: 0.6871
77
+ name: Map
78
+ - type: mrr@10
79
+ value: 0.7117
80
+ name: Mrr@10
81
+ - type: ndcg@10
82
+ value: 0.7413
83
+ name: Ndcg@10
84
+ - task:
85
+ type: cross-encoder-nano-beir
86
+ name: Cross Encoder Nano BEIR
87
+ dataset:
88
+ name: NanoBEIR R100 mean
89
+ type: NanoBEIR_R100_mean
90
+ metrics:
91
+ - type: map
92
+ value: 0.5517
93
+ name: Map
94
+ - type: mrr@10
95
+ value: 0.635
96
+ name: Mrr@10
97
+ - type: ndcg@10
98
+ value: 0.6046
99
+ name: Ndcg@10
100
+ ---
101
+
102
+ # ms-marco-MiniLM-L-6-v2 Finetuned on PV211 HomeWork
103
+
104
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
105
+
106
+ ## Model Details
107
+
108
+ ### Model Description
109
+ - **Model Type:** Cross Encoder
110
+ - **Base model:** [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) <!-- at revision ce0834f22110de6d9222af7a7a03628121708969 -->
111
+ - **Maximum Sequence Length:** 512 tokens
112
+ - **Number of Output Labels:** 1 label
113
+ <!-- - **Training Dataset:** Unknown -->
114
+ - **Language:** en
115
+ - **License:** apache-2.0
116
+
117
+ ### Model Sources
118
+
119
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
120
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
121
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
122
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
123
+
124
+ ## Usage
125
+
126
+ ### Direct Usage (Sentence Transformers)
127
+
128
+ First install the Sentence Transformers library:
129
+
130
+ ```bash
131
+ pip install -U sentence-transformers
132
+ ```
133
+
134
+ Then you can load this model and run inference.
135
+ ```python
136
+ from sentence_transformers import CrossEncoder
137
+
138
+ # Download from the 🤗 Hub
139
+ model = CrossEncoder("maennyn/pv211_beir_cqadupstack_crossencoder2")
140
+ # Get scores for pairs of texts
141
+ pairs = [
142
+ ['Increase the X length of a tikzpicture', "In recent years I've developed a habit of formatting SQL `SELECT` queries like so: SELECT fieldNames FROM sources JOIN tableSource ON col1 = col2 JOIN ( SELECT fieldNames FROM otherSources ) AS subQuery ON subQuery.foo = col2 WHERE someField = somePredicate So you see my pattern: each keyword is on its own line and that keyword's fields are indented by 1 tab-stop and the pattern is used recursively for sub- queries. This works well for all of my `SELECT` queries, as it maximizes readability though at the cost of vertical space; but it doesn't work for things like `INSERT` and `UPDATE` which have radically different syntax. INSERT INTO tableName ( col1, col2, col3, col4, col5, col6, col7, col8 ) VALUES ( 'col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8' ), VALUES ( 'col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8' ) UPDATE tableName SET col1 = 'col1', col2 = 'col2', col3 = 'col3', // etc WHERE someField = somePredicate As you can see, they aren't as pretty, and when you're dealing with tables with a lot of columns they quickly become unweildly. Is there a better way to format `INSERT` and `UPDATE`? And what about `CREATE` statements and other operations?"],
143
+ ['Fillable form: checkbox linked to hide/unhide sections; pushbutton to add/delete rows', "I'd like to create a LaTeX document that when rendered into PDF, has forms that can be filled out using Adobe Reader or other such programs. Then I'd like to be able to extract the data. I deliberately would like to avoid using Acrobat for all the usual reasons (non-free, need different versions for different platforms etc). Can this be done ?"],
144
+ ['Is there any way to get something like pmatrix with customizable grid lines between cells?', "> **Possible Duplicate:** > Highlight elements in the matrix i have a matrix: \\begin{equation} \\begin{bmatrix} 1 & 5 & 4 & 2 & 1 \\\\ 1 & 5 & 4 & 2 & 1 \\\\ 1 & 5 & 4 & 2 & 1 \\\\ \\end{bmatrix} \\label{e:crop1} \\end{equation} and i would like to draw a box around a few of the values to highlight a selection & label it, how would i go about this? I've looked at nodes but havent got a clue. thanks"],
145
+ ["Difference between 'all' and 'all the'", 'I am not confident about my judgement as to whether or not "the" is required if a relative clause is used in a sentence. For example, > The data can be collected on all the computers on which the software is > installed. I think it must be "all the computers " and not be "all computers" because "computers" is specified by "on which the software is installed". Please help me confirm that I am right.'],
146
+ ['Understanding the exclamation mark (!) in bash', "I'm following through a tutorial and it mentions to run this command: sudo chmod 700 !$ I'm not familiar with `!$`. What does it mean?"],
147
+ ]
148
+ scores = model.predict(pairs)
149
+ print(scores.shape)
150
+ # (5,)
151
+
152
+ # Or rank different texts based on similarity to a single text
153
+ ranks = model.rank(
154
+ 'Increase the X length of a tikzpicture',
155
+ [
156
+ "In recent years I've developed a habit of formatting SQL `SELECT` queries like so: SELECT fieldNames FROM sources JOIN tableSource ON col1 = col2 JOIN ( SELECT fieldNames FROM otherSources ) AS subQuery ON subQuery.foo = col2 WHERE someField = somePredicate So you see my pattern: each keyword is on its own line and that keyword's fields are indented by 1 tab-stop and the pattern is used recursively for sub- queries. This works well for all of my `SELECT` queries, as it maximizes readability though at the cost of vertical space; but it doesn't work for things like `INSERT` and `UPDATE` which have radically different syntax. INSERT INTO tableName ( col1, col2, col3, col4, col5, col6, col7, col8 ) VALUES ( 'col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8' ), VALUES ( 'col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8' ) UPDATE tableName SET col1 = 'col1', col2 = 'col2', col3 = 'col3', // etc WHERE someField = somePredicate As you can see, they aren't as pretty, and when you're dealing with tables with a lot of columns they quickly become unweildly. Is there a better way to format `INSERT` and `UPDATE`? And what about `CREATE` statements and other operations?",
157
+ "I'd like to create a LaTeX document that when rendered into PDF, has forms that can be filled out using Adobe Reader or other such programs. Then I'd like to be able to extract the data. I deliberately would like to avoid using Acrobat for all the usual reasons (non-free, need different versions for different platforms etc). Can this be done ?",
158
+ "> **Possible Duplicate:** > Highlight elements in the matrix i have a matrix: \\begin{equation} \\begin{bmatrix} 1 & 5 & 4 & 2 & 1 \\\\ 1 & 5 & 4 & 2 & 1 \\\\ 1 & 5 & 4 & 2 & 1 \\\\ \\end{bmatrix} \\label{e:crop1} \\end{equation} and i would like to draw a box around a few of the values to highlight a selection & label it, how would i go about this? I've looked at nodes but havent got a clue. thanks",
159
+ 'I am not confident about my judgement as to whether or not "the" is required if a relative clause is used in a sentence. For example, > The data can be collected on all the computers on which the software is > installed. I think it must be "all the computers " and not be "all computers" because "computers" is specified by "on which the software is installed". Please help me confirm that I am right.',
160
+ "I'm following through a tutorial and it mentions to run this command: sudo chmod 700 !$ I'm not familiar with `!$`. What does it mean?",
161
+ ]
162
+ )
163
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
164
+ ```
165
+
166
+ <!--
167
+ ### Direct Usage (Transformers)
168
+
169
+ <details><summary>Click to see the direct usage in Transformers</summary>
170
+
171
+ </details>
172
+ -->
173
+
174
+ <!--
175
+ ### Downstream Usage (Sentence Transformers)
176
+
177
+ You can finetune this model on your own dataset.
178
+
179
+ <details><summary>Click to expand</summary>
180
+
181
+ </details>
182
+ -->
183
+
184
+ <!--
185
+ ### Out-of-Scope Use
186
+
187
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
188
+ -->
189
+
190
+ ## Evaluation
191
+
192
+ ### Metrics
193
+
194
+ #### Cross Encoder Correlation
195
+
196
+ * Dataset: `sts_dev`
197
+ * Evaluated with [<code>CrossEncoderCorrelationEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderCorrelationEvaluator)
198
+
199
+ | Metric | Value |
200
+ |:-------------|:-----------|
201
+ | pearson | 0.8858 |
202
+ | **spearman** | **0.8182** |
203
+
204
+ #### Cross Encoder Reranking
205
+
206
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
207
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
208
+ ```json
209
+ {
210
+ "at_k": 10,
211
+ "always_rerank_positives": true
212
+ }
213
+ ```
214
+
215
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
216
+ |:------------|:---------------------|:---------------------|:---------------------|
217
+ | map | 0.6048 (+0.1152) | 0.3633 (+0.1023) | 0.6871 (+0.2674) |
218
+ | mrr@10 | 0.5974 (+0.1199) | 0.5961 (+0.0962) | 0.7117 (+0.2850) |
219
+ | **ndcg@10** | **0.6644 (+0.1240)** | **0.4082 (+0.0832)** | **0.7413 (+0.2407)** |
220
+
221
+ #### Cross Encoder Nano BEIR
222
+
223
+ * Dataset: `NanoBEIR_R100_mean`
224
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
225
+ ```json
226
+ {
227
+ "dataset_names": [
228
+ "msmarco",
229
+ "nfcorpus",
230
+ "nq"
231
+ ],
232
+ "rerank_k": 100,
233
+ "at_k": 10,
234
+ "always_rerank_positives": true
235
+ }
236
+ ```
237
+
238
+ | Metric | Value |
239
+ |:------------|:---------------------|
240
+ | map | 0.5517 (+0.1616) |
241
+ | mrr@10 | 0.6350 (+0.1670) |
242
+ | **ndcg@10** | **0.6046 (+0.1493)** |
243
+
244
+ <!--
245
+ ## Bias, Risks and Limitations
246
+
247
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
248
+ -->
249
+
250
+ <!--
251
+ ### Recommendations
252
+
253
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
254
+ -->
255
+
256
+ ## Training Details
257
+
258
+ ### Training Dataset
259
+
260
+ #### Unnamed Dataset
261
+
262
+ * Size: 36,728 training samples
263
+ * Columns: <code>query</code>, <code>document</code>, and <code>label</code>
264
+ * Approximate statistics based on the first 1000 samples:
265
+ | | query | document | label |
266
+ |:--------|:------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:------------------------------------------------|
267
+ | type | string | string | int |
268
+ | details | <ul><li>min: 15 characters</li><li>mean: 49.89 characters</li><li>max: 128 characters</li></ul> | <ul><li>min: 36 characters</li><li>mean: 718.8 characters</li><li>max: 17541 characters</li></ul> | <ul><li>0: ~48.90%</li><li>1: ~51.10%</li></ul> |
269
+ * Samples:
270
+ | query | document | label |
271
+ |:--------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
272
+ | <code>Increase the X length of a tikzpicture</code> | <code>In recent years I've developed a habit of formatting SQL `SELECT` queries like so: SELECT fieldNames FROM sources JOIN tableSource ON col1 = col2 JOIN ( SELECT fieldNames FROM otherSources ) AS subQuery ON subQuery.foo = col2 WHERE someField = somePredicate So you see my pattern: each keyword is on its own line and that keyword's fields are indented by 1 tab-stop and the pattern is used recursively for sub- queries. This works well for all of my `SELECT` queries, as it maximizes readability though at the cost of vertical space; but it doesn't work for things like `INSERT` and `UPDATE` which have radically different syntax. INSERT INTO tableName ( col1, col2, col3, col4, col5, col6, col7, col8 ) VALUES ( 'col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8' ), VALUES ( 'col1', 'col2', 'col3', 'col4',...</code> | <code>0</code> |
273
+ | <code>Fillable form: checkbox linked to hide/unhide sections; pushbutton to add/delete rows</code> | <code>I'd like to create a LaTeX document that when rendered into PDF, has forms that can be filled out using Adobe Reader or other such programs. Then I'd like to be able to extract the data. I deliberately would like to avoid using Acrobat for all the usual reasons (non-free, need different versions for different platforms etc). Can this be done ?</code> | <code>1</code> |
274
+ | <code>Is there any way to get something like pmatrix with customizable grid lines between cells?</code> | <code>> **Possible Duplicate:** > Highlight elements in the matrix i have a matrix: \begin{equation} \begin{bmatrix} 1 & 5 & 4 & 2 & 1 \\ 1 & 5 & 4 & 2 & 1 \\ 1 & 5 & 4 & 2 & 1 \\ \end{bmatrix} \label{e:crop1} \end{equation} and i would like to draw a box around a few of the values to highlight a selection & label it, how would i go about this? I've looked at nodes but havent got a clue. thanks</code> | <code>1</code> |
275
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
276
+ ```json
277
+ {
278
+ "activation_fn": "torch.nn.modules.linear.Identity",
279
+ "pos_weight": null
280
+ }
281
+ ```
282
+
283
+ ### Training Hyperparameters
284
+ #### Non-Default Hyperparameters
285
+
286
+ - `eval_strategy`: epoch
287
+ - `per_device_train_batch_size`: 16
288
+ - `per_device_eval_batch_size`: 16
289
+ - `learning_rate`: 2e-05
290
+ - `warmup_ratio`: 0.1
291
+ - `save_only_model`: True
292
+ - `fp16`: True
293
+ - `load_best_model_at_end`: True
294
+
295
+ #### All Hyperparameters
296
+ <details><summary>Click to expand</summary>
297
+
298
+ - `overwrite_output_dir`: False
299
+ - `do_predict`: False
300
+ - `eval_strategy`: epoch
301
+ - `prediction_loss_only`: True
302
+ - `per_device_train_batch_size`: 16
303
+ - `per_device_eval_batch_size`: 16
304
+ - `per_gpu_train_batch_size`: None
305
+ - `per_gpu_eval_batch_size`: None
306
+ - `gradient_accumulation_steps`: 1
307
+ - `eval_accumulation_steps`: None
308
+ - `torch_empty_cache_steps`: None
309
+ - `learning_rate`: 2e-05
310
+ - `weight_decay`: 0.0
311
+ - `adam_beta1`: 0.9
312
+ - `adam_beta2`: 0.999
313
+ - `adam_epsilon`: 1e-08
314
+ - `max_grad_norm`: 1.0
315
+ - `num_train_epochs`: 3
316
+ - `max_steps`: -1
317
+ - `lr_scheduler_type`: linear
318
+ - `lr_scheduler_kwargs`: {}
319
+ - `warmup_ratio`: 0.1
320
+ - `warmup_steps`: 0
321
+ - `log_level`: passive
322
+ - `log_level_replica`: warning
323
+ - `log_on_each_node`: True
324
+ - `logging_nan_inf_filter`: True
325
+ - `save_safetensors`: True
326
+ - `save_on_each_node`: False
327
+ - `save_only_model`: True
328
+ - `restore_callback_states_from_checkpoint`: False
329
+ - `no_cuda`: False
330
+ - `use_cpu`: False
331
+ - `use_mps_device`: False
332
+ - `seed`: 42
333
+ - `data_seed`: None
334
+ - `jit_mode_eval`: False
335
+ - `use_ipex`: False
336
+ - `bf16`: False
337
+ - `fp16`: True
338
+ - `fp16_opt_level`: O1
339
+ - `half_precision_backend`: auto
340
+ - `bf16_full_eval`: False
341
+ - `fp16_full_eval`: False
342
+ - `tf32`: None
343
+ - `local_rank`: 0
344
+ - `ddp_backend`: None
345
+ - `tpu_num_cores`: None
346
+ - `tpu_metrics_debug`: False
347
+ - `debug`: []
348
+ - `dataloader_drop_last`: False
349
+ - `dataloader_num_workers`: 0
350
+ - `dataloader_prefetch_factor`: None
351
+ - `past_index`: -1
352
+ - `disable_tqdm`: False
353
+ - `remove_unused_columns`: True
354
+ - `label_names`: None
355
+ - `load_best_model_at_end`: True
356
+ - `ignore_data_skip`: False
357
+ - `fsdp`: []
358
+ - `fsdp_min_num_params`: 0
359
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
360
+ - `tp_size`: 0
361
+ - `fsdp_transformer_layer_cls_to_wrap`: None
362
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
363
+ - `deepspeed`: None
364
+ - `label_smoothing_factor`: 0.0
365
+ - `optim`: adamw_torch
366
+ - `optim_args`: None
367
+ - `adafactor`: False
368
+ - `group_by_length`: False
369
+ - `length_column_name`: length
370
+ - `ddp_find_unused_parameters`: None
371
+ - `ddp_bucket_cap_mb`: None
372
+ - `ddp_broadcast_buffers`: False
373
+ - `dataloader_pin_memory`: True
374
+ - `dataloader_persistent_workers`: False
375
+ - `skip_memory_metrics`: True
376
+ - `use_legacy_prediction_loop`: False
377
+ - `push_to_hub`: False
378
+ - `resume_from_checkpoint`: None
379
+ - `hub_model_id`: None
380
+ - `hub_strategy`: every_save
381
+ - `hub_private_repo`: None
382
+ - `hub_always_push`: False
383
+ - `gradient_checkpointing`: False
384
+ - `gradient_checkpointing_kwargs`: None
385
+ - `include_inputs_for_metrics`: False
386
+ - `include_for_metrics`: []
387
+ - `eval_do_concat_batches`: True
388
+ - `fp16_backend`: auto
389
+ - `push_to_hub_model_id`: None
390
+ - `push_to_hub_organization`: None
391
+ - `mp_parameters`:
392
+ - `auto_find_batch_size`: False
393
+ - `full_determinism`: False
394
+ - `torchdynamo`: None
395
+ - `ray_scope`: last
396
+ - `ddp_timeout`: 1800
397
+ - `torch_compile`: False
398
+ - `torch_compile_backend`: None
399
+ - `torch_compile_mode`: None
400
+ - `include_tokens_per_second`: False
401
+ - `include_num_input_tokens_seen`: False
402
+ - `neftune_noise_alpha`: None
403
+ - `optim_target_modules`: None
404
+ - `batch_eval_metrics`: False
405
+ - `eval_on_start`: False
406
+ - `use_liger_kernel`: False
407
+ - `eval_use_gather_object`: False
408
+ - `average_tokens_across_devices`: False
409
+ - `prompts`: None
410
+ - `batch_sampler`: batch_sampler
411
+ - `multi_dataset_batch_sampler`: proportional
412
+
413
+ </details>
414
+
415
+ ### Training Logs
416
+ | Epoch | Step | Training Loss | sts_dev_spearman | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
417
+ |:-------:|:--------:|:-------------:|:----------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
418
+ | -1 | -1 | - | 0.7222 | 0.6686 (+0.1282) | 0.3930 (+0.0680) | 0.7599 (+0.2592) | 0.6072 (+0.1518) |
419
+ | 0.4355 | 1000 | 0.4163 | - | - | - | - | - |
420
+ | 0.8711 | 2000 | 0.1632 | - | - | - | - | - |
421
+ | **1.0** | **2296** | **-** | **0.8182** | **0.6644 (+0.1240)** | **0.4082 (+0.0832)** | **0.7413 (+0.2407)** | **0.6046 (+0.1493)** |
422
+ | 1.3066 | 3000 | 0.1227 | - | - | - | - | - |
423
+ | 1.7422 | 4000 | 0.1157 | - | - | - | - | - |
424
+ | 2.0 | 4592 | - | 0.8201 | 0.6266 (+0.0862) | 0.4096 (+0.0846) | 0.7032 (+0.2026) | 0.5798 (+0.1244) |
425
+ | 2.1777 | 5000 | 0.0964 | - | - | - | - | - |
426
+ | 2.6132 | 6000 | 0.081 | - | - | - | - | - |
427
+ | 3.0 | 6888 | - | 0.8203 | 0.6241 (+0.0837) | 0.4068 (+0.0817) | 0.6931 (+0.1924) | 0.5747 (+0.1193) |
428
+ | -1 | -1 | - | 0.8182 | 0.6644 (+0.1240) | 0.4082 (+0.0832) | 0.7413 (+0.2407) | 0.6046 (+0.1493) |
429
+
430
+ * The bold row denotes the saved checkpoint.
431
+
432
+ ### Framework Versions
433
+ - Python: 3.11.11
434
+ - Sentence Transformers: 4.1.0
435
+ - Transformers: 4.51.3
436
+ - PyTorch: 2.8.0.dev20250319+cu128
437
+ - Accelerate: 1.6.0
438
+ - Datasets: 3.6.0
439
+ - Tokenizers: 0.21.1
440
+
441
+ ## Citation
442
+
443
+ ### BibTeX
444
+
445
+ #### Sentence Transformers
446
+ ```bibtex
447
+ @inproceedings{reimers-2019-sentence-bert,
448
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
449
+ author = "Reimers, Nils and Gurevych, Iryna",
450
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
451
+ month = "11",
452
+ year = "2019",
453
+ publisher = "Association for Computational Linguistics",
454
+ url = "https://arxiv.org/abs/1908.10084",
455
+ }
456
+ ```
457
+
458
+ <!--
459
+ ## Glossary
460
+
461
+ *Clearly define terms in order to be accessible across audiences.*
462
+ -->
463
+
464
+ <!--
465
+ ## Model Card Authors
466
+
467
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
468
+ -->
469
+
470
+ <!--
471
+ ## Model Card Contact
472
+
473
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
474
+ -->
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 6,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sentence_transformers": {
27
+ "activation_fn": "torch.nn.modules.linear.Identity",
28
+ "version": "4.1.0"
29
+ },
30
+ "torch_dtype": "float32",
31
+ "transformers_version": "4.51.3",
32
+ "type_vocab_size": 2,
33
+ "use_cache": true,
34
+ "vocab_size": 30522
35
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5512e84b07899ce2a75bf2176d66f2859bbcac7e30ebff8f1afe697dc8187f6
3
+ size 90866412
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff