SentenceTransformer based on sentence-transformers/LaBSE

This is a sentence-transformers model finetuned from sentence-transformers/LaBSE. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/LaBSE
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'dge slong tshul khrims nyos [=nyon mongs] pa’i dus| ',
    'geleng šaqšabad buraxa caq müü  sanan sedkeldü sanaxin caq: ',
    'tögünčilen  boluqsad bodhi mahāsadv-noγoudtu oγōto  xadangγadxaxuyin dēdü-bēr  kedüi činēn oγōto  xadangγadxaqsan inu: ilaγun tögüsüqsen maši  γayixamšiq sayibēr oduqsan maši γayixamšiq: ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 966 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 966 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 7 tokens
    • mean: 30.95 tokens
    • max: 193 tokens
    • min: 7 tokens
    • mean: 31.0 tokens
    • max: 201 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    sangyas [=sangs rgyas] bstan la dad pa’i mi’i| spos dang me tog chod par bgyi| de nas byos [=byams] ba mgon pos no [=nam] mkha’ dbyings nas gzit [=gzigs] ti| mi rnosyi [=rnams kyi] mig nas khrag gi mchil byung ba mthong nas| bco [=bcom] ldan ’dasyi [=’das kyi] drung du byon nas zhus pa| bürxüni šiǰindü süzüqten kümün küǰi kiged ceceq-yer takin ülüdkü teged itegel mider (~maider) [=mayidari]-yer oγotoroγon činer-ece ailedeǰi kümün-nuγüd nidan [=nidün]-ece cüsüni nilübüs [=nilbusun] γaraqsn üzed ilγün [=ilaγun] tögüsün ülüqsn derege-dü öged-dü [=ögede] bolod alitxaba [=ayildxaba] . 1.0
    rdo rje drag po dga’ ba che . yeke bayasxulang-tu doqšin očir . 1.0
    stong pa nyid dga’ mchog gi blo . xōsun činar tālaxui tačīngγui oyoutu . 1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • num_train_epochs: 40
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 40
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step
0.375 3
0.75 6
1.0 8
1.125 9
1.5 12
1.875 15
2.0 16
2.25 18
2.625 21
3.0 24
3.375 27
3.75 30
4.0 32
4.125 33
4.5 36
4.875 39
5.0 40
5.25 42
5.625 45
6.0 48
6.375 51
6.75 54
7.0 56
7.125 57
7.5 60
7.875 63
8.0 64
8.25 66
8.625 69
9.0 72
9.375 75
9.75 78
10.0 80
10.125 81
10.5 84
10.875 87
11.0 88
11.25 90
11.625 93
12.0 96
12.375 99
12.75 102
13.0 104
13.125 105
13.5 108
13.875 111
14.0 112
14.25 114
14.625 117
15.0 120
15.375 123
15.75 126
16.0 128
16.125 129
16.5 132
16.875 135
17.0 136
17.25 138
17.625 141
18.0 144
18.375 147
18.75 150
19.0 152
19.125 153
19.5 156
19.875 159
20.0 160
20.25 162
20.625 165
21.0 168
21.375 171
21.75 174
22.0 176
22.125 177
22.5 180
22.875 183
23.0 184
23.25 186
23.625 189
24.0 192
24.375 195
24.75 198
25.0 200
25.125 201
25.5 204
25.875 207
26.0 208
26.25 210
26.625 213
27.0 216
27.375 219
27.75 222
28.0 224
28.125 225
28.5 228
28.875 231
29.0 232
29.25 234
29.625 237
30.0 240
30.375 243
30.75 246
31.0 248
31.125 249
31.5 252
31.875 255
32.0 256
32.25 258
32.625 261
33.0 264
33.375 267
33.75 270
34.0 272
34.125 273
34.5 276
34.875 279
35.0 280
35.25 282
35.625 285
36.0 288
36.375 291
36.75 294
37.0 296
37.125 297
37.5 300
37.875 303
38.0 304
38.25 306
38.625 309
39.0 312
0.1429 3
0.2857 6
0.4286 9
0.5714 12
0.7143 15
0.8571 18
1.0 21
1.1429 24
1.2857 27
1.4286 30
1.5714 33
1.7143 36
1.8571 39
2.0 42
2.1429 45
2.2857 48
2.4286 51
2.5714 54
2.7143 57
2.8571 60
3.0 63
3.1429 66
3.2857 69
3.4286 72
3.5714 75
3.7143 78
3.8571 81
4.0 84
4.1429 87
4.2857 90
4.4286 93
4.5714 96
4.7143 99
4.8571 102
5.0 105
5.1429 108
5.2857 111
5.4286 114
5.5714 117
5.7143 120
5.8571 123
6.0 126
6.1429 129
6.2857 132
6.4286 135
6.5714 138
6.7143 141
6.8571 144
7.0 147
7.1429 150
7.2857 153
7.4286 156
7.5714 159
7.7143 162
7.8571 165
8.0 168
8.1429 171
8.2857 174
8.4286 177
8.5714 180
8.7143 183
8.8571 186
9.0 189
9.1429 192
9.2857 195
9.4286 198
9.5714 201
9.7143 204
9.8571 207
10.0 210
10.1429 213
10.2857 216
10.4286 219
10.5714 222
10.7143 225
10.8571 228
11.0 231
11.1429 234
11.2857 237
11.4286 240
11.5714 243
11.7143 246
11.8571 249
12.0 252
12.1429 255
12.2857 258
12.4286 261
12.5714 264
12.7143 267
12.8571 270
13.0 273
13.1429 276
13.2857 279
13.4286 282
13.5714 285
13.7143 288
13.8571 291
14.0 294
14.1429 297
14.2857 300
14.4286 303
14.5714 306
14.7143 309
14.8571 312
15.0 315
15.1429 318
15.2857 321
15.4286 324
15.5714 327
15.7143 330
15.8571 333
16.0 336
16.1429 339

Framework Versions

  • Python: 3.10.0
  • Sentence Transformers: 5.1.0
  • Transformers: 4.46.3
  • PyTorch: 2.0.1+cu118
  • Accelerate: 1.1.1
  • Datasets: 4.0.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LilNomto/labse_oi_bo

Finetuned
(72)
this model