metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:14356
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-large-en-v1.5
widget:
- source_sentence: >-
Pear trees are usually productive for 50 to 75 years though some still
produce fruit after 100 years .
sentences:
- In the late 1950s , he studied cinema in France .
- >-
Pear trees are usually productive for 50 to 75 years though some still
produce fruit after 100 years .
- >-
A recording medium is a physical material that holds data expressed in
any of the existing recording formats .
- source_sentence: On poor , dry soils there are tropical heathlands .
sentences:
- On poor , dry soils there are tropical heathlands .
- There are plans to build a new library at my school .
- >-
These are forest birds that tend to feed on insects at or near the
ground .
- source_sentence: >-
According to Statistics Canada , the county has a total area of 2004.44
km2 .
sentences:
- >-
In 2018 , there are eleven senators holding ministerial positions and
the head of state , the First mayor .
- >-
According to Statistics Canada , the county has a total area of 2004.44
km2 .
- >-
There are some common ways used to stretch piercings , of different
origins and useful for different people .
- source_sentence: Oll , who was married , fell into severe depressions after he divorced .
sentences:
- >-
Tide pools are a home for hardy organisms such as sea stars , mussels
and clams .
- >-
Endgames can be studied according to the types of pieces that remain on
board .
- Oll , who was married , fell into severe depressions after he divorced .
- source_sentence: She often shared her boots with her sister .
sentences:
- She often shared her boots with her sister .
- >-
Many of the Greek city -states also had a god or goddess associated with
that city .
- Until 1 April 2010 the Departments were as follows .
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
model-index:
- name: SentenceTransformer based on BAAI/bge-large-en-v1.5
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: dev
type: dev
metrics:
- type: pearson_cosine
value: 0.15520756564467134
name: Pearson Cosine
- type: spearman_cosine
value: 0.13351347160793242
name: Spearman Cosine
SentenceTransformer based on BAAI/bge-large-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-large-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Mr-FineTuner/Eval_01_Final_1")
# Run inference
sentences = [
'She often shared her boots with her sister .',
'She often shared her boots with her sister .',
'Until 1 April 2010 the Departments were as follows .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
dev - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.1552 |
| spearman_cosine | 0.1335 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 14,356 training samples
- Columns:
sentence_0,sentence_1, andlabel - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string float details - min: 7 tokens
- mean: 18.64 tokens
- max: 42 tokens
- min: 7 tokens
- mean: 18.64 tokens
- max: 42 tokens
- min: 1.0
- mean: 3.32
- max: 6.0
- Samples:
sentence_0 sentence_1 label Construction of the temple complex started in approximately 1264 BC and lasted for about 20 years , until 1244 BC .Construction of the temple complex started in approximately 1264 BC and lasted for about 20 years , until 1244 BC .3.0He knew which bag to buy for his older sister 's birthday .He knew which bag to buy for his older sister 's birthday .3.0The precise origin of absinthe is unclear .The precise origin of absinthe is unclear .4.0 - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss | dev cosine similarity loss | dev_spearman_cosine |
|---|---|---|---|---|
| 0.5568 | 500 | 0.002 | 0.3391 | 0.1080 |
| 1.0 | 898 | - | 0.3654 | 0.1284 |
| 1.1136 | 1000 | 0.0011 | 0.3726 | 0.1298 |
| 1.6704 | 1500 | 0.0006 | 0.3889 | 0.1438 |
| 2.0 | 1796 | - | 0.3689 | 0.1359 |
| 2.2272 | 2000 | 0.0011 | 0.3737 | 0.1484 |
| 2.7840 | 2500 | 0.0009 | 0.3898 | 0.1349 |
| 3.0 | 2694 | - | 0.3908 | 0.1335 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.1
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}