mpnet-base-multilingual / README.md

HasinMDG

Upload folder using huggingface_hub

5259505 verified about 1 year ago

preview code

raw

history blame

16.3 kB

metadata

library_name: sentence-transformers
metrics:
  - negative_mse
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:25095
  - loss:MSELoss
widget:
  - source_sentence: mariknak pay ketdi a naabrasaak iti kulonganda
    sentences:
      - >-
        Nakuha nako ang usa ka kuptanan sa istorya ug nagsugod kini sa pagbati
        ug porma nga akong gusto
      - >-
        Ang kasarangang pag-ulan sa London, nga adunay kataas nga 10°C ug ang
        ubos nga 6°C. #LondonWeather #RainyDay
      - Controversial religious text causes uproar among community members
  - source_sentence: >
      JUAN COLE: Ang Pagduso sa Islamic State sa Baghdad &#39;Usa ka Pagsulay
      Aron Mabawi ang Gikuha sa Bush Administration&#39; 
    sentences:
      - >-
        Ang Touchdown nga Selebrasyon ni Antonio Brown Sexy Gihapon Alang sa NFL
        Bisan ang duha ka pagduso makapasilo kanimo.
      - >-
        Natuklasan ng mga siyentipiko ang mga bagong species ng nilalang sa
        malalim na dagat
      - i feel so glad doing this
  - source_sentence: New Curriculum Standards to Be Implemented in All Schools Next Year
    sentences:
      - |
        Climate Change This Week: Mega Methane, Tidal Power, and More 
      - >-
        @lilomatic Only in Zimbabwe where u find Opposition party for another
        Opposition party.
      - >
        Ang mamumuno nga si Mike namulong sa Ferguson: &#39;Ang Hustisya Dili
        Kanunay Gisilbi&#39; 
  - source_sentence: i am so blessed and feel blessed to be able to share my creations with you
    sentences:
      - |
        Ania ang Buhaton Sa World Cup Host Cities Gawas sa Pagtan-aw sa Soccer 
      - |
        Hillary Clinton's 'Super Volunteers' Are Back And Ready For 2016 
      - >-
        Awan pay ti koriente para kadagiti paset ti Joburg kalpasan ti uram ti
        kable iti uneg ti daga https://t.co/szuZa380Lr
  - source_sentence: |
      3 Napateg nga Addang (iti Aniaman nga Edad) tapno Agsagana iti Matay 
    sentences:
      - >-
        EPIC! RAND PAUL Laughs at CNN’s Climate Hysteria…Schools Jake Tapper on
        Climate Truth [Video]
      - im feeling horrible
      - 'Image: WC Provincial Disaster Management Centre https://t.co/EcNgpBhjcV'
model-index:
  - name: SentenceTransformer
    results:
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: negative_mse
            value: -0.2521140966564417
            name: Negative Mse

SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 128 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '3 Napateg nga Addang (iti Aniaman nga Edad) tapno Agsagana iti Matay \n',
    'EPIC! RAND PAUL Laughs at CNN’s Climate Hysteria…Schools Jake Tapper on Climate Truth [Video]',
    'Image: WC Provincial Disaster Management Centre https://t.co/EcNgpBhjcV',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

Evaluated with MSEEvaluator

Metric	Value
negative_mse	-0.2521

Training Details

Training Dataset

Unnamed Dataset

Size: 25,095 training samples
Columns: sentence_0 and label
Approximate statistics based on the first 1000 samples:
sentence_0 label
type string list
details
min: 4 tokens
mean: 23.49 tokens
max: 50 tokens

size: 768 elements

	sentence_0	label
type	string	list
details	min: 4 tokens mean: 23.49 tokens max: 50 tokens	size: 768 elements

Samples:

sentence_0	label
`A suicide bomber targeting a crowded market resulting in numerous fatalities`	`[-0.05337272211909294, -0.296869158744812, -0.005234384443610907, -0.017071111127734184, 0.01954558491706848, ...]`
`Jeb Bush To Meet With Charleston Pastors`	`[-0.025684779509902, 0.2293000966310501, -0.005389949772506952, 0.09448838979005814, 0.017471183091402054, ...]`
`New scientific research suggests link between air pollution and lung disease`	`[-0.12967786192893982, 0.19541345536708832, -0.0044404976069927216, -0.06291326135396957, -0.03776596114039421, ...]`

Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
num_train_epochs: 20
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	Training Loss	negative_mse
0.5089	200	-	-0.3720
1.0	393	-	-0.3428
1.0178	400	-	-0.3437
1.2723	500	0.0024	-
1.5267	600	-	-0.3262
2.0	786	-	-0.3153
2.0356	800	-	-0.3156
2.5445	1000	0.0018	-0.3070
3.0	1179	-	-0.3004
3.0534	1200	-	-0.3005
3.5623	1400	-	-0.2959
3.8168	1500	0.0015	-
4.0	1572	-	-0.2907
4.0712	1600	-	-0.2924
4.5802	1800	-	-0.2863
5.0	1965	-	-0.2831
5.0891	2000	0.0013	-0.2841
5.5980	2200	-	-0.2792
6.0	2358	-	-0.2765
6.1069	2400	-	-0.2774
6.3613	2500	0.0012	-
6.6158	2600	-	-0.2734
7.0	2751	-	-0.2716
7.1247	2800	-	-0.2722
7.6336	3000	0.0011	-0.2700
8.0	3144	-	-0.2684
8.1425	3200	-	-0.2683
8.6514	3400	-	-0.2665
8.9059	3500	0.001	-
9.0	3537	-	-0.2645
9.1603	3600	-	-0.2649
9.6692	3800	-	-0.2639
10.0	3930	-	-0.2625
10.1781	4000	0.0009	-0.2619
10.6870	4200	-	-0.2615
11.0	4323	-	-0.2594
11.1959	4400	-	-0.2598
11.4504	4500	0.0009	-
11.7048	4600	-	-0.2587
12.0	4716	-	-0.2582
12.2137	4800	-	-0.2586
12.7226	5000	0.0008	-0.2573
13.0	5109	-	-0.2568
13.2316	5200	-	-0.2567
13.7405	5400	-	-0.2564
13.9949	5500	0.0008	-
14.0	5502	-	-0.2558
14.2494	5600	-	-0.2560
14.7583	5800	-	-0.2551
15.0	5895	-	-0.2548
15.2672	6000	0.0008	-0.2552
15.7761	6200	-	-0.2540
16.0	6288	-	-0.2534
16.2850	6400	-	-0.2538
16.5394	6500	0.0008	-
16.7939	6600	-	-0.2529
17.0	6681	-	-0.2532
17.3028	6800	-	-0.2530
17.8117	7000	0.0008	-0.2528
18.0	7074	-	-0.2525
18.3206	7200	-	-0.2527
18.8295	7400	-	-0.2521

Framework Versions

Python: 3.10.14
Sentence Transformers: 3.1.1
Transformers: 4.44.2
PyTorch: 2.4.0
Accelerate: 0.34.2
Datasets: 3.0.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}