answerdotai/ModernBERT-base trained on schema JSONL

This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 8192 tokens
  • Number of Output Labels: 1 label
  • Language: en
  • License: apache-2.0

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("thanhdath/reranker-ModernBERT-base-schema-bce")
# Get scores for pairs of texts
pairs = [
    ['Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\nHint: released in the year 1945 refers to movie_release_year = 1945;', 'Column: movies.movie_popularity ; Column meaning: Number of Mubi users who love this movie ; Column type: INTEGER ; Column has values: "105" ; Column has null values: False'],
    ['Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\nHint: released in the year 1945 refers to movie_release_year = 1945;', 'Column: lists_users.user_has_payment_method ; Column meaning: user_has_payment_method ; Column type: TEXT ; Column has values: "1" ; Column has null values: False'],
    ['Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\nHint: released in the year 1945 refers to movie_release_year = 1945;', 'Column: lists.list_description ; Column meaning: List description made by the user ; Column type: TEXT ; Column has values: "<p>[sorted by the year released]</p>", "<p>Films sorted by release year.</p>" ; Column has null values: False'],
    ['Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\nHint: released in the year 1945 refers to movie_release_year = 1945;', 'Column: lists.list_second_image_url ; Column meaning: list_second_image_url ; Column type: TEXT ; Column has values:  ; Column has null values: False'],
    ['Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\nHint: released in the year 1945 refers to movie_release_year = 1945;', 'Column: movies.movie_release_year ; Column meaning: Release year of the movie ; Column type: INTEGER ; Column has values: "1945" ; Column has null values: False'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\nHint: released in the year 1945 refers to movie_release_year = 1945;',
    [
        'Column: movies.movie_popularity ; Column meaning: Number of Mubi users who love this movie ; Column type: INTEGER ; Column has values: "105" ; Column has null values: False',
        'Column: lists_users.user_has_payment_method ; Column meaning: user_has_payment_method ; Column type: TEXT ; Column has values: "1" ; Column has null values: False',
        'Column: lists.list_description ; Column meaning: List description made by the user ; Column type: TEXT ; Column has values: "<p>[sorted by the year released]</p>", "<p>Films sorted by release year.</p>" ; Column has null values: False',
        'Column: lists.list_second_image_url ; Column meaning: list_second_image_url ; Column type: TEXT ; Column has values:  ; Column has null values: False',
        'Column: movies.movie_release_year ; Column meaning: Release year of the movie ; Column type: INTEGER ; Column has values: "1945" ; Column has null values: False',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 180,144 training samples
  • Columns: sentence_A, sentence_B, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_A sentence_B label
    type string string int
    details
    • min: 95 characters
    • mean: 264.54 characters
    • max: 551 characters
    • min: 127 characters
    • mean: 177.59 characters
    • max: 313 characters
    • 0: ~75.00%
    • 1: ~25.00%
  • Samples:
    sentence_A sentence_B label
    Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.
    Hint: released in the year 1945 refers to movie_release_year = 1945;
    Column: movies.movie_popularity ; Column meaning: Number of Mubi users who love this movie ; Column type: INTEGER ; Column has values: "105" ; Column has null values: False 1
    Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.
    Hint: released in the year 1945 refers to movie_release_year = 1945;
    Column: lists_users.user_has_payment_method ; Column meaning: user_has_payment_method ; Column type: TEXT ; Column has values: "1" ; Column has null values: False 0
    Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.
    Hint: released in the year 1945 refers to movie_release_year = 1945;
    Column: lists.list_description ; Column meaning: List description made by the user ; Column type: TEXT ; Column has values: "

    [sorted by the year released]

    ", "

    Films sorted by release year.

    " ; Column has null values: False
    0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": 2.9947667121887207
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • seed: 12
  • fp16: True
  • dataloader_num_workers: 4

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0007 1 1.1598
0.0710 100 1.0653
0.1420 200 1.023
0.2131 300 0.9502
0.2841 400 0.8251
0.3551 500 0.6595
0.4261 600 0.5691
0.4972 700 0.5061
0.5682 800 0.4711
0.6392 900 0.4209
0.7102 1000 0.3882
0.7812 1100 0.3778
0.8523 1200 0.3743
0.9233 1300 0.3248
0.9943 1400 0.3283
1.0653 1500 0.2973
1.1364 1600 0.2653
1.2074 1700 0.263
1.2784 1800 0.2548
1.3494 1900 0.2329
1.4205 2000 0.2345
1.4915 2100 0.2303
1.5625 2200 0.205
1.6335 2300 0.2077
1.7045 2400 0.1836
1.7756 2500 0.186
1.8466 2600 0.1877
1.9176 2700 0.1757
1.9886 2800 0.1742
2.0597 2900 0.1278
2.1307 3000 0.104
2.2017 3100 0.1135
2.2727 3200 0.1087
2.3438 3300 0.0998
2.4148 3400 0.103
2.4858 3500 0.1029
2.5568 3600 0.096
2.6278 3700 0.1021
2.6989 3800 0.0836
2.7699 3900 0.08
2.8409 4000 0.0858
2.9119 4100 0.0816
2.9830 4200 0.0724
3.0540 4300 0.0451
3.125 4400 0.0415
3.1960 4500 0.0396
3.2670 4600 0.0397
3.3381 4700 0.0406
3.4091 4800 0.0468
3.4801 4900 0.0395
3.5511 5000 0.0399
3.6222 5100 0.0498
3.6932 5200 0.0453
3.7642 5300 0.0376
3.8352 5400 0.0472
3.9062 5500 0.038
3.9773 5600 0.0323
4.0483 5700 0.0214
4.1193 5800 0.0173
4.1903 5900 0.0229
4.2614 6000 0.0218
4.3324 6100 0.0216
4.4034 6200 0.0135
4.4744 6300 0.015
4.5455 6400 0.0204
4.6165 6500 0.0201
4.6875 6600 0.0145
4.7585 6700 0.0146
4.8295 6800 0.0191
4.9006 6900 0.0204
4.9716 7000 0.0129
5.0426 7100 0.0158
5.1136 7200 0.0045
5.1847 7300 0.0033
5.2557 7400 0.0041
5.3267 7500 0.0082
5.3977 7600 0.0129
5.4688 7700 0.0055
5.5398 7800 0.0047
5.6108 7900 0.0076
5.6818 8000 0.0085
5.7528 8100 0.0129
5.8239 8200 0.0089
5.8949 8300 0.0074
5.9659 8400 0.0075
6.0369 8500 0.0061
6.1080 8600 0.0025
6.1790 8700 0.003
6.25 8800 0.0055
6.3210 8900 0.0048
6.3920 9000 0.0036
6.4631 9100 0.0052
6.5341 9200 0.0014
6.6051 9300 0.0045
6.6761 9400 0.0022
6.7472 9500 0.0043
6.8182 9600 0.0036
6.8892 9700 0.0062
6.9602 9800 0.0059
7.0312 9900 0.0018
7.1023 10000 0.0029
7.1733 10100 0.002
7.2443 10200 0.004
7.3153 10300 0.002
7.3864 10400 0.0016
7.4574 10500 0.0031
7.5284 10600 0.0032
7.5994 10700 0.0025
7.6705 10800 0.0016
7.7415 10900 0.0014
7.8125 11000 0.0011
7.8835 11100 0.0005
7.9545 11200 0.0001
8.0256 11300 0.0001
8.0966 11400 0.0003
8.1676 11500 0.0
8.2386 11600 0.0021
8.3097 11700 0.0001
8.3807 11800 0.0002
8.4517 11900 0.0002
8.5227 12000 0.0027
8.5938 12100 0.0
8.6648 12200 0.0024
8.7358 12300 0.0001
8.8068 12400 0.0004
8.8778 12500 0.0001
8.9489 12600 0.0008
9.0199 12700 0.0001
9.0909 12800 0.0
9.1619 12900 0.0
9.2330 13000 0.0
9.3040 13100 0.0
9.375 13200 0.0
9.4460 13300 0.0
9.5170 13400 0.0
9.5881 13500 0.0
9.6591 13600 0.0
9.7301 13700 0.0
9.8011 13800 0.0
9.8722 13900 0.0
9.9432 14000 0.0021

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 5.1.0
  • Transformers: 4.55.2
  • PyTorch: 2.2.2+cu121
  • Accelerate: 1.10.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
7
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thanhdath/reranker-ModernBERT-base-schema-bce

Finetuned
(660)
this model