cyberbabooshka commited on
Commit
05630d3
·
verified ·
1 Parent(s): 08bf490

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,1029 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:1760
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: mixedbread-ai/mxbai-embed-large-v1
10
+ widget:
11
+ - source_sentence: What is the relationship between the x- and y-coordinates in a
12
+ linear relationship, and how can this relationship be represented visually on
13
+ a graph?
14
+ sentences:
15
+ - '"A linear relationship is a relationship between variables such that when plotted
16
+ on a coordinate plane, the points lie on a line." Additionally, "You can think
17
+ of a line, then, as a collection of an infinite number of individual points that
18
+ share the same mathematical relationship."'
19
+ - '"A ''model'' is a situation-specific description of a phenomenon based on a theory,
20
+ that allows us to make a specific prediction." and "In physics, it is particularly
21
+ important to distinguish between these two terms. A model provides an immediate
22
+ understanding of something based on a theory."'
23
+ - '"Use capital letters to denote sets, $A,B, C, X, Y$ etc. [...] if you stick with
24
+ these conventions people reading your work (including the person marking your
25
+ exams) will know — ''Oh $A$ is that set they are talking about'' and ''$a$ is
26
+ an element of that set.''"'
27
+ - source_sentence: What factors influence whether thin-film interference results in
28
+ constructive or destructive interference?
29
+ sentences:
30
+ - '"For nonrelativistic velocities, an observer moving along at the same velocity
31
+ as an Ohmic conductor measures the usual Ohm''s law in his reference frame, $\textbf{J}_{f}''
32
+ = \sigma \textbf{E}''$... the current density in all inertial frames is the same
33
+ so that (3) in (4) gives us the generalized Ohm''s law as $\textbf{J}_{f}'' =
34
+ \textbf{J}_{f} = \sigma (\textbf{E} + \textbf{v} \times \textbf{B})$ where v is
35
+ the velocity of the conductor."'
36
+ - '"Thin-film interference thus depends on film thickness, the wavelength of light,
37
+ and the refractive indices."'
38
+ - '"A summary of the properties of concave mirrors is shown below: • converging
39
+ • real image • inverted • image in front of mirror. A summary of the properties
40
+ of convex mirrors is shown below: • diverging • virtual image • upright • image
41
+ behind mirror."'
42
+ - source_sentence: How do non-conservative forces affect the total energy change in
43
+ a system undergoing an irreversible process?
44
+ sentences:
45
+ - '"Energy is conserved but some mechanical energy has been transferred into nonrecoverable
46
+ energy $W_{\mathrm{nc}}$. We shall refer to processes in which there is non-zero
47
+ nonrecoverable energy as irreversible processes."'
48
+ - '"Hamilton’s equations give $2s$ first-order differential equations for $p_{k},q_{k}$
49
+ for each of the $s=n-m$ degrees of freedom. Lagrange’s equations give $s$ second-order
50
+ differential equations for the $s$ independent generalized coordinates $q_{k},\dot{q}_{k}."'
51
+ - '"Determine what happens as $\Delta x$ approaches 0."'
52
+ - source_sentence: What are the conditions under which a mutant virus is likely to
53
+ replace a wildtype virus in a population, according to the SIR model of disease
54
+ dynamics?
55
+ sentences:
56
+ - '"In the limit of high Reynolds number, viscosity disappears from the problem
57
+ and the drag force should not depend on viscosity. This reasoning contains several
58
+ subtle untruths, yet its conclusion is mostly correct. ... To make \( F \) independent
59
+ of viscosity, \( F \) must be independent of Reynolds number!"'
60
+ - '"A more mathematically rigorous name would be the renormalization monoid."'
61
+ - '"I^{\prime}$ increases exponentially if $\frac{\beta^{\prime}(d+c+\gamma)}{\beta}-\left(d+c^{\prime}+\gamma^{\prime}\right)>0$
62
+ or after some elementary algebra, $\frac{\beta^{\prime}}{d+c^{\prime}+\gamma^{\prime}}>\frac{\beta}{d+c+\gamma}$."
63
+ Additionally, "our result (4.6.8) suggests that endemic viruses (or other microorganisms)
64
+ will tend to evolve (i) to be more easily transmitted between people $\left(\beta^{\prime}>\beta\right)
65
+ ;$ (ii) to make people sick longer $\left(\gamma^{\prime}<\gamma\right)$, and;
66
+ (iii) to be less deadly $c^{\prime}<c$."'
67
+ - source_sentence: What is the relationship between the smallest perturbation of a
68
+ matrix and its rank, as established in theorems regarding matrix perturbations?
69
+ sentences:
70
+ - '"Suppose $A \in C^{m \times n}$ has full column rank (= n). Then $\min _{\Delta
71
+ \in \mathbb{C}^{m \times n}}\left\{\|\Delta\|_{2} \mid A+\Delta \text { has rank
72
+ }<n\right\}=\sigma_{n}(A)$."'
73
+ - '"Complementary angles have measures that add up to 90 degrees."'
74
+ - '"If a beam of light enters and then exits the elevator, the observer on Earth
75
+ and the one accelerating in empty space must observe the same thing, since they
76
+ cannot distinguish between being on Earth or accelerating in space. The observer
77
+ in space, who is accelerating, will observe that the beam of light bends as it
78
+ crosses the elevator... that means that if the path of a beam of light is curved
79
+ near Earth, it must be because space itself is curved in the presence of a gravitational
80
+ field!"'
81
+ pipeline_tag: sentence-similarity
82
+ library_name: sentence-transformers
83
+ metrics:
84
+ - cosine_accuracy@1
85
+ - cosine_accuracy@3
86
+ - cosine_accuracy@5
87
+ - cosine_accuracy@10
88
+ - cosine_precision@1
89
+ - cosine_precision@3
90
+ - cosine_precision@5
91
+ - cosine_precision@10
92
+ - cosine_recall@1
93
+ - cosine_recall@3
94
+ - cosine_recall@5
95
+ - cosine_recall@10
96
+ - cosine_ndcg@10
97
+ - cosine_mrr@10
98
+ - cosine_map@100
99
+ model-index:
100
+ - name: SentenceTransformer based on mixedbread-ai/mxbai-embed-large-v1
101
+ results:
102
+ - task:
103
+ type: information-retrieval
104
+ name: Information Retrieval
105
+ dataset:
106
+ name: eval
107
+ type: eval
108
+ metrics:
109
+ - type: cosine_accuracy@1
110
+ value: 0.6095238095238096
111
+ name: Cosine Accuracy@1
112
+ - type: cosine_accuracy@3
113
+ value: 0.7357142857142858
114
+ name: Cosine Accuracy@3
115
+ - type: cosine_accuracy@5
116
+ value: 0.7880952380952381
117
+ name: Cosine Accuracy@5
118
+ - type: cosine_accuracy@10
119
+ value: 0.8357142857142857
120
+ name: Cosine Accuracy@10
121
+ - type: cosine_precision@1
122
+ value: 0.6095238095238096
123
+ name: Cosine Precision@1
124
+ - type: cosine_precision@3
125
+ value: 0.24523809523809523
126
+ name: Cosine Precision@3
127
+ - type: cosine_precision@5
128
+ value: 0.1576190476190476
129
+ name: Cosine Precision@5
130
+ - type: cosine_precision@10
131
+ value: 0.08357142857142856
132
+ name: Cosine Precision@10
133
+ - type: cosine_recall@1
134
+ value: 0.6095238095238096
135
+ name: Cosine Recall@1
136
+ - type: cosine_recall@3
137
+ value: 0.7357142857142858
138
+ name: Cosine Recall@3
139
+ - type: cosine_recall@5
140
+ value: 0.7880952380952381
141
+ name: Cosine Recall@5
142
+ - type: cosine_recall@10
143
+ value: 0.8357142857142857
144
+ name: Cosine Recall@10
145
+ - type: cosine_ndcg@10
146
+ value: 0.7208819738090569
147
+ name: Cosine Ndcg@10
148
+ - type: cosine_mrr@10
149
+ value: 0.6843074452003023
150
+ name: Cosine Mrr@10
151
+ - type: cosine_map@100
152
+ value: 0.6897541058718275
153
+ name: Cosine Map@100
154
+ ---
155
+
156
+ # SentenceTransformer based on mixedbread-ai/mxbai-embed-large-v1
157
+
158
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
159
+
160
+ ## Model Details
161
+
162
+ ### Model Description
163
+ - **Model Type:** Sentence Transformer
164
+ - **Base model:** [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) <!-- at revision db9d1fe0f31addb4978201b2bf3e577f3f8900d2 -->
165
+ - **Maximum Sequence Length:** 512 tokens
166
+ - **Output Dimensionality:** 1024 dimensions
167
+ - **Similarity Function:** Cosine Similarity
168
+ <!-- - **Training Dataset:** Unknown -->
169
+ <!-- - **Language:** Unknown -->
170
+ <!-- - **License:** Unknown -->
171
+
172
+ ### Model Sources
173
+
174
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
175
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
176
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
177
+
178
+ ### Full Model Architecture
179
+
180
+ ```
181
+ SentenceTransformer(
182
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
183
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
184
+ )
185
+ ```
186
+
187
+ ## Usage
188
+
189
+ ### Direct Usage (Sentence Transformers)
190
+
191
+ First install the Sentence Transformers library:
192
+
193
+ ```bash
194
+ pip install -U sentence-transformers
195
+ ```
196
+
197
+ Then you can load this model and run inference.
198
+ ```python
199
+ from sentence_transformers import SentenceTransformer
200
+
201
+ # Download from the 🤗 Hub
202
+ model = SentenceTransformer("cyberbabooshka/mtebai")
203
+ # Run inference
204
+ sentences = [
205
+ 'What is the relationship between the smallest perturbation of a matrix and its rank, as established in theorems regarding matrix perturbations?',
206
+ '"Suppose $A \\in C^{m \\times n}$ has full column rank (= n). Then $\\min _{\\Delta \\in \\mathbb{C}^{m \\times n}}\\left\\{\\|\\Delta\\|_{2} \\mid A+\\Delta \\text { has rank }<n\\right\\}=\\sigma_{n}(A)$."',
207
+ '"If a beam of light enters and then exits the elevator, the observer on Earth and the one accelerating in empty space must observe the same thing, since they cannot distinguish between being on Earth or accelerating in space. The observer in space, who is accelerating, will observe that the beam of light bends as it crosses the elevator... that means that if the path of a beam of light is curved near Earth, it must be because space itself is curved in the presence of a gravitational field!"',
208
+ ]
209
+ embeddings = model.encode(sentences)
210
+ print(embeddings.shape)
211
+ # [3, 1024]
212
+
213
+ # Get the similarity scores for the embeddings
214
+ similarities = model.similarity(embeddings, embeddings)
215
+ print(similarities.shape)
216
+ # [3, 3]
217
+ ```
218
+
219
+ <!--
220
+ ### Direct Usage (Transformers)
221
+
222
+ <details><summary>Click to see the direct usage in Transformers</summary>
223
+
224
+ </details>
225
+ -->
226
+
227
+ <!--
228
+ ### Downstream Usage (Sentence Transformers)
229
+
230
+ You can finetune this model on your own dataset.
231
+
232
+ <details><summary>Click to expand</summary>
233
+
234
+ </details>
235
+ -->
236
+
237
+ <!--
238
+ ### Out-of-Scope Use
239
+
240
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
241
+ -->
242
+
243
+ ## Evaluation
244
+
245
+ ### Metrics
246
+
247
+ #### Information Retrieval
248
+
249
+ * Dataset: `eval`
250
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
251
+
252
+ | Metric | Value |
253
+ |:--------------------|:-----------|
254
+ | cosine_accuracy@1 | 0.6095 |
255
+ | cosine_accuracy@3 | 0.7357 |
256
+ | cosine_accuracy@5 | 0.7881 |
257
+ | cosine_accuracy@10 | 0.8357 |
258
+ | cosine_precision@1 | 0.6095 |
259
+ | cosine_precision@3 | 0.2452 |
260
+ | cosine_precision@5 | 0.1576 |
261
+ | cosine_precision@10 | 0.0836 |
262
+ | cosine_recall@1 | 0.6095 |
263
+ | cosine_recall@3 | 0.7357 |
264
+ | cosine_recall@5 | 0.7881 |
265
+ | cosine_recall@10 | 0.8357 |
266
+ | **cosine_ndcg@10** | **0.7209** |
267
+ | cosine_mrr@10 | 0.6843 |
268
+ | cosine_map@100 | 0.6898 |
269
+
270
+ <!--
271
+ ## Bias, Risks and Limitations
272
+
273
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
274
+ -->
275
+
276
+ <!--
277
+ ### Recommendations
278
+
279
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
280
+ -->
281
+
282
+ ## Training Details
283
+
284
+ ### Training Dataset
285
+
286
+ #### Unnamed Dataset
287
+
288
+ * Size: 1,760 training samples
289
+ * Columns: <code>anchor</code> and <code>positive</code>
290
+ * Approximate statistics based on the first 1000 samples:
291
+ | | anchor | positive |
292
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
293
+ | type | string | string |
294
+ | details | <ul><li>min: 9 tokens</li><li>mean: 24.87 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 68.37 tokens</li><li>max: 500 tokens</li></ul> |
295
+ * Samples:
296
+ | anchor | positive |
297
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
298
+ | <code>How is a proper coloring of a graph defined in the context of vertices and edges?</code> | <code>"A coloring is called proper if for each edge joining two distinct vertices, the two vertices it joins have different colors."</code> |
299
+ | <code>What is the relationship between the first excited state of the box model and the p orbitals in a hydrogen atom?</code> | <code>"The p orbitals are similar to the first excited state of the box, i.e. $(n_{x},n_{y},n_{z})=(2,1,1)$ is similar to a $p_{x}$ orbital, $(n_{x},n_{y},n_{z})=(1,2,1)$ is similar to a $p_{y}$ orbital and $(n_{x},n_{y},n_{z})=(1,1,2)$ is similar to a $p_{z}$ orbital."</code> |
300
+ | <code>How can the behavior of the derivative \( f'(x) \) indicate the presence of a local maximum or minimum at a critical point \( x=a \)?</code> | <code>"If there is a local maximum when \( x=a \), the function must be lower near \( x=a \) than it is right at \( x=a \). If the derivative exists near \( x=a \), this means \( f'(x)>0 \) when \( x \) is near \( a \) and \( x < a \), because the function must 'slope up' just to the left of \( a \). Similarly, \( f'(x) < 0 \) when \( x \) is near \( a \) and \( x>a \), because \( f \) slopes down from the local maximum as we move to the right. Using the same reasoning, if there is a local minimum at \( x=a \), the derivative of \( f \) must be negative just to the left of \( a \) and positive just to the right."</code> |
301
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
302
+ ```json
303
+ {
304
+ "scale": 20.0,
305
+ "similarity_fct": "cos_sim"
306
+ }
307
+ ```
308
+
309
+ ### Evaluation Dataset
310
+
311
+ #### Unnamed Dataset
312
+
313
+ * Size: 420 evaluation samples
314
+ * Columns: <code>anchor</code> and <code>positive</code>
315
+ * Approximate statistics based on the first 420 samples:
316
+ | | anchor | positive |
317
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
318
+ | type | string | string |
319
+ | details | <ul><li>min: 12 tokens</li><li>mean: 24.97 tokens</li><li>max: 66 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 68.52 tokens</li><li>max: 452 tokens</li></ul> |
320
+ * Samples:
321
+ | anchor | positive |
322
+ |:------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
323
+ | <code>What are the two central classes mentioned in the FileSystem framework and what do they represent?</code> | <code>"The class `FileReference` is the most important entry point to the framework." and "FileSystem is a powerful and elegant library to manipulate files."</code> |
324
+ | <code>What is the significance of Turing's work in the context of PDE-based models for self-organization of complex systems?</code> | <code>"Turing’s monumental work on the chemical basis of morphogenesis played an important role in igniting researchers’ attention to the PDE-based continuous field models as a mathematical framework to study self-organization of complex systems."</code> |
325
+ | <code>What are the two options for reducing accelerations as discussed in the passage?</code> | <code>"From the above definitions we see that there are really two options for reducing accelerations. We can reduce the amount that velocity changes, or we can increase the time over which the velocity changes (or both)."</code> |
326
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
327
+ ```json
328
+ {
329
+ "scale": 20.0,
330
+ "similarity_fct": "cos_sim"
331
+ }
332
+ ```
333
+
334
+ ### Training Hyperparameters
335
+ #### Non-Default Hyperparameters
336
+
337
+ - `eval_strategy`: epoch
338
+ - `per_device_train_batch_size`: 16
339
+ - `per_device_eval_batch_size`: 16
340
+ - `learning_rate`: 2e-05
341
+ - `weight_decay`: 0.05
342
+ - `num_train_epochs`: 10
343
+ - `warmup_ratio`: 0.1
344
+ - `fp16`: True
345
+ - `eval_on_start`: True
346
+
347
+ #### All Hyperparameters
348
+ <details><summary>Click to expand</summary>
349
+
350
+ - `overwrite_output_dir`: False
351
+ - `do_predict`: False
352
+ - `eval_strategy`: epoch
353
+ - `prediction_loss_only`: True
354
+ - `per_device_train_batch_size`: 16
355
+ - `per_device_eval_batch_size`: 16
356
+ - `per_gpu_train_batch_size`: None
357
+ - `per_gpu_eval_batch_size`: None
358
+ - `gradient_accumulation_steps`: 1
359
+ - `eval_accumulation_steps`: None
360
+ - `torch_empty_cache_steps`: None
361
+ - `learning_rate`: 2e-05
362
+ - `weight_decay`: 0.05
363
+ - `adam_beta1`: 0.9
364
+ - `adam_beta2`: 0.999
365
+ - `adam_epsilon`: 1e-08
366
+ - `max_grad_norm`: 1.0
367
+ - `num_train_epochs`: 10
368
+ - `max_steps`: -1
369
+ - `lr_scheduler_type`: linear
370
+ - `lr_scheduler_kwargs`: {}
371
+ - `warmup_ratio`: 0.1
372
+ - `warmup_steps`: 0
373
+ - `log_level`: passive
374
+ - `log_level_replica`: warning
375
+ - `log_on_each_node`: True
376
+ - `logging_nan_inf_filter`: True
377
+ - `save_safetensors`: True
378
+ - `save_on_each_node`: False
379
+ - `save_only_model`: False
380
+ - `restore_callback_states_from_checkpoint`: False
381
+ - `no_cuda`: False
382
+ - `use_cpu`: False
383
+ - `use_mps_device`: False
384
+ - `seed`: 42
385
+ - `data_seed`: None
386
+ - `jit_mode_eval`: False
387
+ - `use_ipex`: False
388
+ - `bf16`: False
389
+ - `fp16`: True
390
+ - `fp16_opt_level`: O1
391
+ - `half_precision_backend`: auto
392
+ - `bf16_full_eval`: False
393
+ - `fp16_full_eval`: False
394
+ - `tf32`: None
395
+ - `local_rank`: 0
396
+ - `ddp_backend`: None
397
+ - `tpu_num_cores`: None
398
+ - `tpu_metrics_debug`: False
399
+ - `debug`: []
400
+ - `dataloader_drop_last`: False
401
+ - `dataloader_num_workers`: 0
402
+ - `dataloader_prefetch_factor`: None
403
+ - `past_index`: -1
404
+ - `disable_tqdm`: False
405
+ - `remove_unused_columns`: True
406
+ - `label_names`: None
407
+ - `load_best_model_at_end`: False
408
+ - `ignore_data_skip`: False
409
+ - `fsdp`: []
410
+ - `fsdp_min_num_params`: 0
411
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
412
+ - `fsdp_transformer_layer_cls_to_wrap`: None
413
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
414
+ - `deepspeed`: None
415
+ - `label_smoothing_factor`: 0.0
416
+ - `optim`: adamw_torch
417
+ - `optim_args`: None
418
+ - `adafactor`: False
419
+ - `group_by_length`: False
420
+ - `length_column_name`: length
421
+ - `ddp_find_unused_parameters`: None
422
+ - `ddp_bucket_cap_mb`: None
423
+ - `ddp_broadcast_buffers`: False
424
+ - `dataloader_pin_memory`: True
425
+ - `dataloader_persistent_workers`: False
426
+ - `skip_memory_metrics`: True
427
+ - `use_legacy_prediction_loop`: False
428
+ - `push_to_hub`: False
429
+ - `resume_from_checkpoint`: None
430
+ - `hub_model_id`: None
431
+ - `hub_strategy`: every_save
432
+ - `hub_private_repo`: None
433
+ - `hub_always_push`: False
434
+ - `gradient_checkpointing`: False
435
+ - `gradient_checkpointing_kwargs`: None
436
+ - `include_inputs_for_metrics`: False
437
+ - `include_for_metrics`: []
438
+ - `eval_do_concat_batches`: True
439
+ - `fp16_backend`: auto
440
+ - `push_to_hub_model_id`: None
441
+ - `push_to_hub_organization`: None
442
+ - `mp_parameters`:
443
+ - `auto_find_batch_size`: False
444
+ - `full_determinism`: False
445
+ - `torchdynamo`: None
446
+ - `ray_scope`: last
447
+ - `ddp_timeout`: 1800
448
+ - `torch_compile`: False
449
+ - `torch_compile_backend`: None
450
+ - `torch_compile_mode`: None
451
+ - `include_tokens_per_second`: False
452
+ - `include_num_input_tokens_seen`: False
453
+ - `neftune_noise_alpha`: None
454
+ - `optim_target_modules`: None
455
+ - `batch_eval_metrics`: False
456
+ - `eval_on_start`: True
457
+ - `use_liger_kernel`: False
458
+ - `eval_use_gather_object`: False
459
+ - `average_tokens_across_devices`: False
460
+ - `prompts`: None
461
+ - `batch_sampler`: batch_sampler
462
+ - `multi_dataset_batch_sampler`: proportional
463
+
464
+ </details>
465
+
466
+ ### Training Logs
467
+ <details><summary>Click to expand</summary>
468
+
469
+ | Epoch | Step | Training Loss | Validation Loss | eval_cosine_ndcg@10 |
470
+ |:------:|:----:|:-------------:|:---------------:|:-------------------:|
471
+ | 0 | 0 | - | 0.0946 | 0.6733 |
472
+ | 0.0091 | 1 | 0.1033 | - | - |
473
+ | 0.0182 | 2 | 0.0771 | - | - |
474
+ | 0.0273 | 3 | 0.0611 | - | - |
475
+ | 0.0364 | 4 | 0.1437 | - | - |
476
+ | 0.0455 | 5 | 0.1298 | - | - |
477
+ | 0.0545 | 6 | 0.2036 | - | - |
478
+ | 0.0636 | 7 | 0.0443 | - | - |
479
+ | 0.0727 | 8 | 0.1252 | - | - |
480
+ | 0.0818 | 9 | 0.1543 | - | - |
481
+ | 0.0909 | 10 | 0.0783 | - | - |
482
+ | 0.1 | 11 | 0.0986 | - | - |
483
+ | 0.1091 | 12 | 0.0788 | - | - |
484
+ | 0.1182 | 13 | 0.128 | - | - |
485
+ | 0.1273 | 14 | 0.1214 | - | - |
486
+ | 0.1364 | 15 | 0.0514 | - | - |
487
+ | 0.1455 | 16 | 0.0867 | - | - |
488
+ | 0.1545 | 17 | 0.0348 | - | - |
489
+ | 0.1636 | 18 | 0.0464 | - | - |
490
+ | 0.1727 | 19 | 0.0458 | - | - |
491
+ | 0.1818 | 20 | 0.1203 | - | - |
492
+ | 0.1909 | 21 | 0.11 | - | - |
493
+ | 0.2 | 22 | 0.0953 | - | - |
494
+ | 0.2091 | 23 | 0.0253 | - | - |
495
+ | 0.2182 | 24 | 0.0346 | - | - |
496
+ | 0.2273 | 25 | 0.0736 | - | - |
497
+ | 0.2364 | 26 | 0.218 | - | - |
498
+ | 0.2455 | 27 | 0.022 | - | - |
499
+ | 0.2545 | 28 | 0.1169 | - | - |
500
+ | 0.2636 | 29 | 0.0089 | - | - |
501
+ | 0.2727 | 30 | 0.0151 | - | - |
502
+ | 0.2818 | 31 | 0.2936 | - | - |
503
+ | 0.2909 | 32 | 0.0334 | - | - |
504
+ | 0.3 | 33 | 0.1829 | - | - |
505
+ | 0.3091 | 34 | 0.0225 | - | - |
506
+ | 0.3182 | 35 | 0.0729 | - | - |
507
+ | 0.3273 | 36 | 0.022 | - | - |
508
+ | 0.3364 | 37 | 0.0068 | - | - |
509
+ | 0.3455 | 38 | 0.0237 | - | - |
510
+ | 0.3545 | 39 | 0.0235 | - | - |
511
+ | 0.3636 | 40 | 0.014 | - | - |
512
+ | 0.3727 | 41 | 0.0754 | - | - |
513
+ | 0.3818 | 42 | 0.0271 | - | - |
514
+ | 0.3909 | 43 | 0.0154 | - | - |
515
+ | 0.4 | 44 | 0.0128 | - | - |
516
+ | 0.4091 | 45 | 0.0196 | - | - |
517
+ | 0.4182 | 46 | 0.1689 | - | - |
518
+ | 0.4273 | 47 | 0.0149 | - | - |
519
+ | 0.4364 | 48 | 0.1441 | - | - |
520
+ | 0.4455 | 49 | 0.0532 | - | - |
521
+ | 0.4545 | 50 | 0.0204 | - | - |
522
+ | 0.4636 | 51 | 0.0111 | - | - |
523
+ | 0.4727 | 52 | 0.0612 | - | - |
524
+ | 0.4818 | 53 | 0.0813 | - | - |
525
+ | 0.4909 | 54 | 0.0044 | - | - |
526
+ | 0.5 | 55 | 0.0029 | - | - |
527
+ | 0.5091 | 56 | 0.011 | - | - |
528
+ | 0.5182 | 57 | 0.0098 | - | - |
529
+ | 0.5273 | 58 | 0.0339 | - | - |
530
+ | 0.5364 | 59 | 0.0284 | - | - |
531
+ | 0.5455 | 60 | 0.0235 | - | - |
532
+ | 0.5545 | 61 | 0.0117 | - | - |
533
+ | 0.5636 | 62 | 0.0118 | - | - |
534
+ | 0.5727 | 63 | 0.0047 | - | - |
535
+ | 0.5818 | 64 | 0.0176 | - | - |
536
+ | 0.5909 | 65 | 0.1605 | - | - |
537
+ | 0.6 | 66 | 0.3625 | - | - |
538
+ | 0.6091 | 67 | 0.06 | - | - |
539
+ | 0.6182 | 68 | 0.0283 | - | - |
540
+ | 0.6273 | 69 | 0.038 | - | - |
541
+ | 0.6364 | 70 | 0.0114 | - | - |
542
+ | 0.6455 | 71 | 0.0258 | - | - |
543
+ | 0.6545 | 72 | 0.1058 | - | - |
544
+ | 0.6636 | 73 | 0.0921 | - | - |
545
+ | 0.6727 | 74 | 0.0215 | - | - |
546
+ | 0.6818 | 75 | 0.0613 | - | - |
547
+ | 0.6909 | 76 | 0.0138 | - | - |
548
+ | 0.7 | 77 | 0.1214 | - | - |
549
+ | 0.7091 | 78 | 0.0868 | - | - |
550
+ | 0.7182 | 79 | 0.0251 | - | - |
551
+ | 0.7273 | 80 | 0.0243 | - | - |
552
+ | 0.7364 | 81 | 0.0159 | - | - |
553
+ | 0.7455 | 82 | 0.0416 | - | - |
554
+ | 0.7545 | 83 | 0.0272 | - | - |
555
+ | 0.7636 | 84 | 0.0487 | - | - |
556
+ | 0.7727 | 85 | 0.1019 | - | - |
557
+ | 0.7818 | 86 | 0.0378 | - | - |
558
+ | 0.7909 | 87 | 0.0228 | - | - |
559
+ | 0.8 | 88 | 0.009 | - | - |
560
+ | 0.8091 | 89 | 0.024 | - | - |
561
+ | 0.8182 | 90 | 0.0266 | - | - |
562
+ | 0.8273 | 91 | 0.0927 | - | - |
563
+ | 0.8364 | 92 | 0.0065 | - | - |
564
+ | 0.8455 | 93 | 0.0061 | - | - |
565
+ | 0.8545 | 94 | 0.0633 | - | - |
566
+ | 0.8636 | 95 | 0.0044 | - | - |
567
+ | 0.8727 | 96 | 0.0082 | - | - |
568
+ | 0.8818 | 97 | 0.0108 | - | - |
569
+ | 0.8909 | 98 | 0.009 | - | - |
570
+ | 0.9 | 99 | 0.0493 | - | - |
571
+ | 0.9091 | 100 | 0.1834 | - | - |
572
+ | 0.9182 | 101 | 0.0372 | - | - |
573
+ | 0.9273 | 102 | 0.046 | - | - |
574
+ | 0.9364 | 103 | 0.0056 | - | - |
575
+ | 0.9455 | 104 | 0.0038 | - | - |
576
+ | 0.9545 | 105 | 0.0183 | - | - |
577
+ | 0.9636 | 106 | 0.027 | - | - |
578
+ | 0.9727 | 107 | 0.0747 | - | - |
579
+ | 0.9818 | 108 | 0.0038 | - | - |
580
+ | 0.9909 | 109 | 0.0165 | - | - |
581
+ | 1.0 | 110 | 0.0188 | 0.0271 | 0.7058 |
582
+ | 1.0091 | 111 | 0.0169 | - | - |
583
+ | 1.0182 | 112 | 0.0101 | - | - |
584
+ | 1.0273 | 113 | 0.0044 | - | - |
585
+ | 1.0364 | 114 | 0.0061 | - | - |
586
+ | 1.0455 | 115 | 0.0059 | - | - |
587
+ | 1.0545 | 116 | 0.0089 | - | - |
588
+ | 1.0636 | 117 | 0.0849 | - | - |
589
+ | 1.0727 | 118 | 0.0099 | - | - |
590
+ | 1.0818 | 119 | 0.0129 | - | - |
591
+ | 1.0909 | 120 | 0.0202 | - | - |
592
+ | 1.1 | 121 | 0.0032 | - | - |
593
+ | 1.1091 | 122 | 0.0027 | - | - |
594
+ | 1.1182 | 123 | 0.0061 | - | - |
595
+ | 1.1273 | 124 | 0.004 | - | - |
596
+ | 1.1364 | 125 | 0.0028 | - | - |
597
+ | 1.1455 | 126 | 0.0463 | - | - |
598
+ | 1.1545 | 127 | 0.0024 | - | - |
599
+ | 1.1636 | 128 | 0.0044 | - | - |
600
+ | 1.1727 | 129 | 0.1313 | - | - |
601
+ | 1.1818 | 130 | 0.0022 | - | - |
602
+ | 1.1909 | 131 | 0.0026 | - | - |
603
+ | 1.2 | 132 | 0.0696 | - | - |
604
+ | 1.2091 | 133 | 0.0323 | - | - |
605
+ | 1.2182 | 134 | 0.0027 | - | - |
606
+ | 1.2273 | 135 | 0.1714 | - | - |
607
+ | 1.2364 | 136 | 0.0365 | - | - |
608
+ | 1.2455 | 137 | 0.0116 | - | - |
609
+ | 1.2545 | 138 | 0.0036 | - | - |
610
+ | 1.2636 | 139 | 0.0296 | - | - |
611
+ | 1.2727 | 140 | 0.0037 | - | - |
612
+ | 1.2818 | 141 | 0.0036 | - | - |
613
+ | 1.2909 | 142 | 0.0025 | - | - |
614
+ | 1.3 | 143 | 0.0043 | - | - |
615
+ | 1.3091 | 144 | 0.0021 | - | - |
616
+ | 1.3182 | 145 | 0.0032 | - | - |
617
+ | 1.3273 | 146 | 0.0263 | - | - |
618
+ | 1.3364 | 147 | 0.0014 | - | - |
619
+ | 1.3455 | 148 | 0.0993 | - | - |
620
+ | 1.3545 | 149 | 0.0045 | - | - |
621
+ | 1.3636 | 150 | 0.006 | - | - |
622
+ | 1.3727 | 151 | 0.0045 | - | - |
623
+ | 1.3818 | 152 | 0.0022 | - | - |
624
+ | 1.3909 | 153 | 0.0048 | - | - |
625
+ | 1.4 | 154 | 0.0133 | - | - |
626
+ | 1.4091 | 155 | 0.0018 | - | - |
627
+ | 1.4182 | 156 | 0.0012 | - | - |
628
+ | 1.4273 | 157 | 0.001 | - | - |
629
+ | 1.4364 | 158 | 0.0051 | - | - |
630
+ | 1.4455 | 159 | 0.0636 | - | - |
631
+ | 1.4545 | 160 | 0.0911 | - | - |
632
+ | 1.4636 | 161 | 0.0034 | - | - |
633
+ | 1.4727 | 162 | 0.021 | - | - |
634
+ | 1.4818 | 163 | 0.0034 | - | - |
635
+ | 1.4909 | 164 | 0.0022 | - | - |
636
+ | 1.5 | 165 | 0.0109 | - | - |
637
+ | 1.5091 | 166 | 0.0009 | - | - |
638
+ | 1.5182 | 167 | 0.0124 | - | - |
639
+ | 1.5273 | 168 | 0.0097 | - | - |
640
+ | 1.5364 | 169 | 0.0136 | - | - |
641
+ | 1.5455 | 170 | 0.0063 | - | - |
642
+ | 1.5545 | 171 | 0.0105 | - | - |
643
+ | 1.5636 | 172 | 0.0114 | - | - |
644
+ | 1.5727 | 173 | 0.0061 | - | - |
645
+ | 1.5818 | 174 | 0.002 | - | - |
646
+ | 1.5909 | 175 | 0.0037 | - | - |
647
+ | 1.6 | 176 | 0.0279 | - | - |
648
+ | 1.6091 | 177 | 0.0191 | - | - |
649
+ | 1.6182 | 178 | 0.0025 | - | - |
650
+ | 1.6273 | 179 | 0.0009 | - | - |
651
+ | 1.6364 | 180 | 0.0019 | - | - |
652
+ | 1.6455 | 181 | 0.001 | - | - |
653
+ | 1.6545 | 182 | 0.0023 | - | - |
654
+ | 1.6636 | 183 | 0.0005 | - | - |
655
+ | 1.6727 | 184 | 0.0025 | - | - |
656
+ | 1.6818 | 185 | 0.0048 | - | - |
657
+ | 1.6909 | 186 | 0.0035 | - | - |
658
+ | 1.7 | 187 | 0.0328 | - | - |
659
+ | 1.7091 | 188 | 0.0139 | - | - |
660
+ | 1.7182 | 189 | 0.0097 | - | - |
661
+ | 1.7273 | 190 | 0.0051 | - | - |
662
+ | 1.7364 | 191 | 0.0153 | - | - |
663
+ | 1.7455 | 192 | 0.0127 | - | - |
664
+ | 1.7545 | 193 | 0.0828 | - | - |
665
+ | 1.7636 | 194 | 0.0214 | - | - |
666
+ | 1.7727 | 195 | 0.0038 | - | - |
667
+ | 1.7818 | 196 | 0.008 | - | - |
668
+ | 1.7909 | 197 | 0.0218 | - | - |
669
+ | 1.8 | 198 | 0.017 | - | - |
670
+ | 1.8091 | 199 | 0.0016 | - | - |
671
+ | 1.8182 | 200 | 0.0017 | - | - |
672
+ | 1.8273 | 201 | 0.004 | - | - |
673
+ | 1.8364 | 202 | 0.0134 | - | - |
674
+ | 1.8455 | 203 | 0.0103 | - | - |
675
+ | 1.8545 | 204 | 0.0018 | - | - |
676
+ | 1.8636 | 205 | 0.0069 | - | - |
677
+ | 1.8727 | 206 | 0.0617 | - | - |
678
+ | 1.8818 | 207 | 0.0024 | - | - |
679
+ | 1.8909 | 208 | 0.0451 | - | - |
680
+ | 1.9 | 209 | 0.0109 | - | - |
681
+ | 1.9091 | 210 | 0.004 | - | - |
682
+ | 1.9182 | 211 | 0.0035 | - | - |
683
+ | 1.9273 | 212 | 0.0041 | - | - |
684
+ | 1.9364 | 213 | 0.015 | - | - |
685
+ | 1.9455 | 214 | 0.004 | - | - |
686
+ | 1.9545 | 215 | 0.0043 | - | - |
687
+ | 1.9636 | 216 | 0.0036 | - | - |
688
+ | 1.9727 | 217 | 0.0049 | - | - |
689
+ | 1.9818 | 218 | 0.0153 | - | - |
690
+ | 1.9909 | 219 | 0.0037 | - | - |
691
+ | 2.0 | 220 | 0.0039 | 0.0232 | 0.7161 |
692
+ | 2.0091 | 221 | 0.0318 | - | - |
693
+ | 2.0182 | 222 | 0.003 | - | - |
694
+ | 2.0273 | 223 | 0.0145 | - | - |
695
+ | 2.0364 | 224 | 0.0031 | - | - |
696
+ | 2.0455 | 225 | 0.0025 | - | - |
697
+ | 2.0545 | 226 | 0.0041 | - | - |
698
+ | 2.0636 | 227 | 0.0011 | - | - |
699
+ | 2.0727 | 228 | 0.002 | - | - |
700
+ | 2.0818 | 229 | 0.0597 | - | - |
701
+ | 2.0909 | 230 | 0.0011 | - | - |
702
+ | 2.1 | 231 | 0.0008 | - | - |
703
+ | 2.1091 | 232 | 0.0013 | - | - |
704
+ | 2.1182 | 233 | 0.0056 | - | - |
705
+ | 2.1273 | 234 | 0.004 | - | - |
706
+ | 2.1364 | 235 | 0.0009 | - | - |
707
+ | 2.1455 | 236 | 0.0008 | - | - |
708
+ | 2.1545 | 237 | 0.0029 | - | - |
709
+ | 2.1636 | 238 | 0.0081 | - | - |
710
+ | 2.1727 | 239 | 0.0019 | - | - |
711
+ | 2.1818 | 240 | 0.0021 | - | - |
712
+ | 2.1909 | 241 | 0.0034 | - | - |
713
+ | 2.2 | 242 | 0.0004 | - | - |
714
+ | 2.2091 | 243 | 0.002 | - | - |
715
+ | 2.2182 | 244 | 0.0011 | - | - |
716
+ | 2.2273 | 245 | 0.0487 | - | - |
717
+ | 2.2364 | 246 | 0.0014 | - | - |
718
+ | 2.2455 | 247 | 0.0024 | - | - |
719
+ | 2.2545 | 248 | 0.004 | - | - |
720
+ | 2.2636 | 249 | 0.0028 | - | - |
721
+ | 2.2727 | 250 | 0.0016 | - | - |
722
+ | 2.2818 | 251 | 0.0053 | - | - |
723
+ | 2.2909 | 252 | 0.0039 | - | - |
724
+ | 2.3 | 253 | 0.0015 | - | - |
725
+ | 2.3091 | 254 | 0.0023 | - | - |
726
+ | 2.3182 | 255 | 0.0022 | - | - |
727
+ | 2.3273 | 256 | 0.001 | - | - |
728
+ | 2.3364 | 257 | 0.0016 | - | - |
729
+ | 2.3455 | 258 | 0.0039 | - | - |
730
+ | 2.3545 | 259 | 0.0041 | - | - |
731
+ | 2.3636 | 260 | 0.0013 | - | - |
732
+ | 2.3727 | 261 | 0.0253 | - | - |
733
+ | 2.3818 | 262 | 0.0242 | - | - |
734
+ | 2.3909 | 263 | 0.0021 | - | - |
735
+ | 2.4 | 264 | 0.001 | - | - |
736
+ | 2.4091 | 265 | 0.0013 | - | - |
737
+ | 2.4182 | 266 | 0.0038 | - | - |
738
+ | 2.4273 | 267 | 0.0082 | - | - |
739
+ | 2.4364 | 268 | 0.0071 | - | - |
740
+ | 2.4455 | 269 | 0.0027 | - | - |
741
+ | 2.4545 | 270 | 0.0005 | - | - |
742
+ | 2.4636 | 271 | 0.0009 | - | - |
743
+ | 2.4727 | 272 | 0.0014 | - | - |
744
+ | 2.4818 | 273 | 0.0007 | - | - |
745
+ | 2.4909 | 274 | 0.0088 | - | - |
746
+ | 2.5 | 275 | 0.0147 | - | - |
747
+ | 2.5091 | 276 | 0.0013 | - | - |
748
+ | 2.5182 | 277 | 0.0012 | - | - |
749
+ | 2.5273 | 278 | 0.0007 | - | - |
750
+ | 2.5364 | 279 | 0.0049 | - | - |
751
+ | 2.5455 | 280 | 0.0143 | - | - |
752
+ | 2.5545 | 281 | 0.0014 | - | - |
753
+ | 2.5636 | 282 | 0.0226 | - | - |
754
+ | 2.5727 | 283 | 0.0025 | - | - |
755
+ | 2.5818 | 284 | 0.0007 | - | - |
756
+ | 2.5909 | 285 | 0.0026 | - | - |
757
+ | 2.6 | 286 | 0.0011 | - | - |
758
+ | 2.6091 | 287 | 0.0773 | - | - |
759
+ | 2.6182 | 288 | 0.0094 | - | - |
760
+ | 2.6273 | 289 | 0.0012 | - | - |
761
+ | 2.6364 | 290 | 0.0024 | - | - |
762
+ | 2.6455 | 291 | 0.0012 | - | - |
763
+ | 2.6545 | 292 | 0.0082 | - | - |
764
+ | 2.6636 | 293 | 0.1715 | - | - |
765
+ | 2.6727 | 294 | 0.0006 | - | - |
766
+ | 2.6818 | 295 | 0.0022 | - | - |
767
+ | 2.6909 | 296 | 0.0014 | - | - |
768
+ | 2.7 | 297 | 0.0026 | - | - |
769
+ | 2.7091 | 298 | 0.0014 | - | - |
770
+ | 2.7182 | 299 | 0.001 | - | - |
771
+ | 2.7273 | 300 | 0.0013 | - | - |
772
+ | 2.7364 | 301 | 0.0196 | - | - |
773
+ | 2.7455 | 302 | 0.0023 | - | - |
774
+ | 2.7545 | 303 | 0.0013 | - | - |
775
+ | 2.7636 | 304 | 0.0021 | - | - |
776
+ | 2.7727 | 305 | 0.0048 | - | - |
777
+ | 2.7818 | 306 | 0.0014 | - | - |
778
+ | 2.7909 | 307 | 0.0011 | - | - |
779
+ | 2.8 | 308 | 0.0005 | - | - |
780
+ | 2.8091 | 309 | 0.003 | - | - |
781
+ | 2.8182 | 310 | 0.0009 | - | - |
782
+ | 2.8273 | 311 | 0.0123 | - | - |
783
+ | 2.8364 | 312 | 0.0118 | - | - |
784
+ | 2.8455 | 313 | 0.0015 | - | - |
785
+ | 2.8545 | 314 | 0.0088 | - | - |
786
+ | 2.8636 | 315 | 0.0067 | - | - |
787
+ | 2.8727 | 316 | 0.0016 | - | - |
788
+ | 2.8818 | 317 | 0.0212 | - | - |
789
+ | 2.8909 | 318 | 0.0015 | - | - |
790
+ | 2.9 | 319 | 0.0006 | - | - |
791
+ | 2.9091 | 320 | 0.0013 | - | - |
792
+ | 2.9182 | 321 | 0.001 | - | - |
793
+ | 2.9273 | 322 | 0.0168 | - | - |
794
+ | 2.9364 | 323 | 0.0051 | - | - |
795
+ | 2.9455 | 324 | 0.0013 | - | - |
796
+ | 2.9545 | 325 | 0.0014 | - | - |
797
+ | 2.9636 | 326 | 0.0007 | - | - |
798
+ | 2.9727 | 327 | 0.0059 | - | - |
799
+ | 2.9818 | 328 | 0.0006 | - | - |
800
+ | 2.9909 | 329 | 0.0008 | - | - |
801
+ | 3.0 | 330 | 0.002 | 0.0255 | 0.7063 |
802
+ | 3.0091 | 331 | 0.0007 | - | - |
803
+ | 3.0182 | 332 | 0.0014 | - | - |
804
+ | 3.0273 | 333 | 0.0007 | - | - |
805
+ | 3.0364 | 334 | 0.0031 | - | - |
806
+ | 3.0455 | 335 | 0.0007 | - | - |
807
+ | 3.0545 | 336 | 0.0011 | - | - |
808
+ | 3.0636 | 337 | 0.0759 | - | - |
809
+ | 3.0727 | 338 | 0.0027 | - | - |
810
+ | 3.0818 | 339 | 0.0004 | - | - |
811
+ | 3.0909 | 340 | 0.001 | - | - |
812
+ | 3.1 | 341 | 0.0004 | - | - |
813
+ | 3.1091 | 342 | 0.0083 | - | - |
814
+ | 3.1182 | 343 | 0.0008 | - | - |
815
+ | 3.1273 | 344 | 0.0086 | - | - |
816
+ | 3.1364 | 345 | 0.0019 | - | - |
817
+ | 3.1455 | 346 | 0.0087 | - | - |
818
+ | 3.1545 | 347 | 0.0019 | - | - |
819
+ | 3.1636 | 348 | 0.0017 | - | - |
820
+ | 3.1727 | 349 | 0.0054 | - | - |
821
+ | 3.1818 | 350 | 0.0338 | - | - |
822
+ | 3.1909 | 351 | 0.006 | - | - |
823
+ | 3.2 | 352 | 0.0014 | - | - |
824
+ | 3.2091 | 353 | 0.0023 | - | - |
825
+ | 3.2182 | 354 | 0.011 | - | - |
826
+ | 3.2273 | 355 | 0.0004 | - | - |
827
+ | 3.2364 | 356 | 0.0099 | - | - |
828
+ | 3.2455 | 357 | 0.0017 | - | - |
829
+ | 3.2545 | 358 | 0.001 | - | - |
830
+ | 3.2636 | 359 | 0.0024 | - | - |
831
+ | 3.2727 | 360 | 0.0043 | - | - |
832
+ | 3.2818 | 361 | 0.0029 | - | - |
833
+ | 3.2909 | 362 | 0.0023 | - | - |
834
+ | 3.3 | 363 | 0.0007 | - | - |
835
+ | 3.3091 | 364 | 0.0035 | - | - |
836
+ | 3.3182 | 365 | 0.0014 | - | - |
837
+ | 3.3273 | 366 | 0.0052 | - | - |
838
+ | 3.3364 | 367 | 0.0526 | - | - |
839
+ | 3.3455 | 368 | 0.0017 | - | - |
840
+ | 3.3545 | 369 | 0.0082 | - | - |
841
+ | 3.3636 | 370 | 0.0014 | - | - |
842
+ | 3.3727 | 371 | 0.0018 | - | - |
843
+ | 3.3818 | 372 | 0.0013 | - | - |
844
+ | 3.3909 | 373 | 0.0029 | - | - |
845
+ | 3.4 | 374 | 0.0082 | - | - |
846
+ | 3.4091 | 375 | 0.0005 | - | - |
847
+ | 3.4182 | 376 | 0.0007 | - | - |
848
+ | 3.4273 | 377 | 0.0014 | - | - |
849
+ | 3.4364 | 378 | 0.006 | - | - |
850
+ | 3.4455 | 379 | 0.0023 | - | - |
851
+ | 3.4545 | 380 | 0.0017 | - | - |
852
+ | 3.4636 | 381 | 0.0011 | - | - |
853
+ | 3.4727 | 382 | 0.0004 | - | - |
854
+ | 3.4818 | 383 | 0.0033 | - | - |
855
+ | 3.4909 | 384 | 0.0017 | - | - |
856
+ | 3.5 | 385 | 0.0016 | - | - |
857
+ | 3.5091 | 386 | 0.001 | - | - |
858
+ | 3.5182 | 387 | 0.0055 | - | - |
859
+ | 3.5273 | 388 | 0.0058 | - | - |
860
+ | 3.5364 | 389 | 0.0015 | - | - |
861
+ | 3.5455 | 390 | 0.0024 | - | - |
862
+ | 3.5545 | 391 | 0.0022 | - | - |
863
+ | 3.5636 | 392 | 0.0007 | - | - |
864
+ | 3.5727 | 393 | 0.001 | - | - |
865
+ | 3.5818 | 394 | 0.0007 | - | - |
866
+ | 3.5909 | 395 | 0.0046 | - | - |
867
+ | 3.6 | 396 | 0.0005 | - | - |
868
+ | 3.6091 | 397 | 0.0011 | - | - |
869
+ | 3.6182 | 398 | 0.001 | - | - |
870
+ | 3.6273 | 399 | 0.0012 | - | - |
871
+ | 3.6364 | 400 | 0.0006 | - | - |
872
+ | 3.6455 | 401 | 0.0038 | - | - |
873
+ | 3.6545 | 402 | 0.0008 | - | - |
874
+ | 3.6636 | 403 | 0.0009 | - | - |
875
+ | 3.6727 | 404 | 0.0014 | - | - |
876
+ | 3.6818 | 405 | 0.0029 | - | - |
877
+ | 3.6909 | 406 | 0.0052 | - | - |
878
+ | 3.7 | 407 | 0.0011 | - | - |
879
+ | 3.7091 | 408 | 0.0112 | - | - |
880
+ | 3.7182 | 409 | 0.0011 | - | - |
881
+ | 3.7273 | 410 | 0.0045 | - | - |
882
+ | 3.7364 | 411 | 0.0086 | - | - |
883
+ | 3.7455 | 412 | 0.0029 | - | - |
884
+ | 3.7545 | 413 | 0.0004 | - | - |
885
+ | 3.7636 | 414 | 0.0075 | - | - |
886
+ | 3.7727 | 415 | 0.0007 | - | - |
887
+ | 3.7818 | 416 | 0.0006 | - | - |
888
+ | 3.7909 | 417 | 0.0007 | - | - |
889
+ | 3.8 | 418 | 0.0018 | - | - |
890
+ | 3.8091 | 419 | 0.0009 | - | - |
891
+ | 3.8182 | 420 | 0.0258 | - | - |
892
+ | 3.8273 | 421 | 0.0009 | - | - |
893
+ | 3.8364 | 422 | 0.0011 | - | - |
894
+ | 3.8455 | 423 | 0.0004 | - | - |
895
+ | 3.8545 | 424 | 0.0044 | - | - |
896
+ | 3.8636 | 425 | 0.0019 | - | - |
897
+ | 3.8727 | 426 | 0.0015 | - | - |
898
+ | 3.8818 | 427 | 0.1093 | - | - |
899
+ | 3.8909 | 428 | 0.0018 | - | - |
900
+ | 3.9 | 429 | 0.0018 | - | - |
901
+ | 3.9091 | 430 | 0.0142 | - | - |
902
+ | 3.9182 | 431 | 0.0048 | - | - |
903
+ | 3.9273 | 432 | 0.0012 | - | - |
904
+ | 3.9364 | 433 | 0.0113 | - | - |
905
+ | 3.9455 | 434 | 0.002 | - | - |
906
+ | 3.9545 | 435 | 0.0003 | - | - |
907
+ | 3.9636 | 436 | 0.001 | - | - |
908
+ | 3.9727 | 437 | 0.0004 | - | - |
909
+ | 3.9818 | 438 | 0.0012 | - | - |
910
+ | 3.9909 | 439 | 0.0097 | - | - |
911
+ | 4.0 | 440 | 0.1549 | 0.0200 | 0.7209 |
912
+ | 4.0091 | 441 | 0.0031 | - | - |
913
+ | 4.0182 | 442 | 0.0007 | - | - |
914
+ | 4.0273 | 443 | 0.0096 | - | - |
915
+ | 4.0364 | 444 | 0.0007 | - | - |
916
+ | 4.0455 | 445 | 0.0052 | - | - |
917
+ | 4.0545 | 446 | 0.0019 | - | - |
918
+ | 4.0636 | 447 | 0.0009 | - | - |
919
+ | 4.0727 | 448 | 0.0006 | - | - |
920
+ | 4.0818 | 449 | 0.0022 | - | - |
921
+ | 4.0909 | 450 | 0.0392 | - | - |
922
+ | 4.1 | 451 | 0.023 | - | - |
923
+ | 4.1091 | 452 | 0.0009 | - | - |
924
+ | 4.1182 | 453 | 0.0012 | - | - |
925
+ | 4.1273 | 454 | 0.0009 | - | - |
926
+ | 4.1364 | 455 | 0.0044 | - | - |
927
+ | 4.1455 | 456 | 0.0004 | - | - |
928
+ | 4.1545 | 457 | 0.0009 | - | - |
929
+ | 4.1636 | 458 | 0.0093 | - | - |
930
+ | 4.1727 | 459 | 0.0011 | - | - |
931
+ | 4.1818 | 460 | 0.0014 | - | - |
932
+ | 4.1909 | 461 | 0.0008 | - | - |
933
+ | 4.2 | 462 | 0.0011 | - | - |
934
+ | 4.2091 | 463 | 0.0231 | - | - |
935
+ | 4.2182 | 464 | 0.0018 | - | - |
936
+ | 4.2273 | 465 | 0.001 | - | - |
937
+ | 4.2364 | 466 | 0.0048 | - | - |
938
+ | 4.2455 | 467 | 0.0012 | - | - |
939
+ | 4.2545 | 468 | 0.0012 | - | - |
940
+ | 4.2636 | 469 | 0.0025 | - | - |
941
+ | 4.2727 | 470 | 0.0065 | - | - |
942
+ | 4.2818 | 471 | 0.0008 | - | - |
943
+ | 4.2909 | 472 | 0.0793 | - | - |
944
+ | 4.3 | 473 | 0.0015 | - | - |
945
+ | 4.3091 | 474 | 0.0013 | - | - |
946
+ | 4.3182 | 475 | 0.0044 | - | - |
947
+ | 4.3273 | 476 | 0.0324 | - | - |
948
+ | 4.3364 | 477 | 0.0023 | - | - |
949
+ | 4.3455 | 478 | 0.0019 | - | - |
950
+ | 4.3545 | 479 | 0.001 | - | - |
951
+ | 4.3636 | 480 | 0.0067 | - | - |
952
+ | 4.3727 | 481 | 0.0015 | - | - |
953
+ | 4.3818 | 482 | 0.0005 | - | - |
954
+ | 4.3909 | 483 | 0.0012 | - | - |
955
+ | 4.4 | 484 | 0.012 | - | - |
956
+ | 4.4091 | 485 | 0.0013 | - | - |
957
+ | 4.4182 | 486 | 0.0008 | - | - |
958
+ | 4.4273 | 487 | 0.0027 | - | - |
959
+ | 4.4364 | 488 | 0.0016 | - | - |
960
+ | 4.4455 | 489 | 0.0006 | - | - |
961
+ | 4.4545 | 490 | 0.0007 | - | - |
962
+ | 4.4636 | 491 | 0.0008 | - | - |
963
+ | 4.4727 | 492 | 0.0015 | - | - |
964
+ | 4.4818 | 493 | 0.001 | - | - |
965
+ | 4.4909 | 494 | 0.0013 | - | - |
966
+ | 4.5 | 495 | 0.0021 | - | - |
967
+ | 4.5091 | 496 | 0.001 | - | - |
968
+ | 4.5182 | 497 | 0.0008 | - | - |
969
+ | 4.5273 | 498 | 0.0009 | - | - |
970
+ | 4.5364 | 499 | 0.0016 | - | - |
971
+ | 4.5455 | 500 | 0.0007 | - | - |
972
+
973
+ </details>
974
+
975
+ ### Framework Versions
976
+ - Python: 3.12.9
977
+ - Sentence Transformers: 4.1.0
978
+ - Transformers: 4.52.3
979
+ - PyTorch: 2.6.0+cu124
980
+ - Accelerate: 1.7.0
981
+ - Datasets: 3.6.0
982
+ - Tokenizers: 0.21.1
983
+
984
+ ## Citation
985
+
986
+ ### BibTeX
987
+
988
+ #### Sentence Transformers
989
+ ```bibtex
990
+ @inproceedings{reimers-2019-sentence-bert,
991
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
992
+ author = "Reimers, Nils and Gurevych, Iryna",
993
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
994
+ month = "11",
995
+ year = "2019",
996
+ publisher = "Association for Computational Linguistics",
997
+ url = "https://arxiv.org/abs/1908.10084",
998
+ }
999
+ ```
1000
+
1001
+ #### MultipleNegativesRankingLoss
1002
+ ```bibtex
1003
+ @misc{henderson2017efficient,
1004
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1005
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1006
+ year={2017},
1007
+ eprint={1705.00652},
1008
+ archivePrefix={arXiv},
1009
+ primaryClass={cs.CL}
1010
+ }
1011
+ ```
1012
+
1013
+ <!--
1014
+ ## Glossary
1015
+
1016
+ *Clearly define terms in order to be accessible across audiences.*
1017
+ -->
1018
+
1019
+ <!--
1020
+ ## Model Card Authors
1021
+
1022
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1023
+ -->
1024
+
1025
+ <!--
1026
+ ## Model Card Contact
1027
+
1028
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1029
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4096,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 24,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": false,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.52.3",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: ",
9
+ "passage": ""
10
+ },
11
+ "default_prompt_name": null,
12
+ "similarity_fn_name": "cosine"
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95a595ab0daaa2e7ad41f4b96345deba3407c1fc9b8342369d87ea995bb0cb4c
3
+ size 1340612432
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff