HasinMDG's picture
Upload folder using huggingface_hub
5259505 verified
|
raw
history blame
16.3 kB
metadata
library_name: sentence-transformers
metrics:
  - negative_mse
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:25095
  - loss:MSELoss
widget:
  - source_sentence: mariknak pay ketdi a naabrasaak iti kulonganda
    sentences:
      - >-
        Nakuha nako ang usa ka kuptanan sa istorya ug nagsugod kini sa pagbati
        ug porma nga akong gusto
      - >-
        Ang kasarangang pag-ulan sa London, nga adunay kataas nga 10°C ug ang
        ubos nga 6°C. #LondonWeather #RainyDay
      - Controversial religious text causes uproar among community members
  - source_sentence: >
      JUAN COLE: Ang Pagduso sa Islamic State sa Baghdad 'Usa ka Pagsulay
      Aron Mabawi ang Gikuha sa Bush Administration' 
    sentences:
      - >-
        Ang Touchdown nga Selebrasyon ni Antonio Brown Sexy Gihapon Alang sa NFL
        Bisan ang duha ka pagduso makapasilo kanimo.
      - >-
        Natuklasan ng mga siyentipiko ang mga bagong species ng nilalang sa
        malalim na dagat
      - i feel so glad doing this
  - source_sentence: New Curriculum Standards to Be Implemented in All Schools Next Year
    sentences:
      - |
        Climate Change This Week: Mega Methane, Tidal Power, and More 
      - >-
        @lilomatic Only in Zimbabwe where u find Opposition party for another
        Opposition party.
      - >
        Ang mamumuno nga si Mike namulong sa Ferguson: 'Ang Hustisya Dili
        Kanunay Gisilbi' 
  - source_sentence: i am so blessed and feel blessed to be able to share my creations with you
    sentences:
      - |
        Ania ang Buhaton Sa World Cup Host Cities Gawas sa Pagtan-aw sa Soccer 
      - |
        Hillary Clinton's 'Super Volunteers' Are Back And Ready For 2016 
      - >-
        Awan pay ti koriente para kadagiti paset ti Joburg kalpasan ti uram ti
        kable iti uneg ti daga https://t.co/szuZa380Lr
  - source_sentence: |
      3 Napateg nga Addang (iti Aniaman nga Edad) tapno Agsagana iti Matay 
    sentences:
      - >-
        EPIC! RAND PAUL Laughs at CNN’s Climate Hysteria…Schools Jake Tapper on
        Climate Truth [Video]
      - im feeling horrible
      - 'Image: WC Provincial Disaster Management Centre https://t.co/EcNgpBhjcV'
model-index:
  - name: SentenceTransformer
    results:
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: negative_mse
            value: -0.2521140966564417
            name: Negative Mse

SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '3 Napateg nga Addang (iti Aniaman nga Edad) tapno Agsagana iti Matay \n',
    'EPIC! RAND PAUL Laughs at CNN’s Climate Hysteria…Schools Jake Tapper on Climate Truth [Video]',
    'Image: WC Provincial Disaster Management Centre https://t.co/EcNgpBhjcV',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

Metric Value
negative_mse -0.2521

Training Details

Training Dataset

Unnamed Dataset

  • Size: 25,095 training samples
  • Columns: sentence_0 and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 label
    type string list
    details
    • min: 4 tokens
    • mean: 23.49 tokens
    • max: 50 tokens
    • size: 768 elements
  • Samples:
    sentence_0 label
    A suicide bomber targeting a crowded market resulting in numerous fatalities [-0.05337272211909294, -0.296869158744812, -0.005234384443610907, -0.017071111127734184, 0.01954558491706848, ...]
    Jeb Bush To Meet With Charleston Pastors
    [-0.025684779509902, 0.2293000966310501, -0.005389949772506952, 0.09448838979005814, 0.017471183091402054, ...]
    New scientific research suggests link between air pollution and lung disease [-0.12967786192893982, 0.19541345536708832, -0.0044404976069927216, -0.06291326135396957, -0.03776596114039421, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 20
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss negative_mse
0.5089 200 - -0.3720
1.0 393 - -0.3428
1.0178 400 - -0.3437
1.2723 500 0.0024 -
1.5267 600 - -0.3262
2.0 786 - -0.3153
2.0356 800 - -0.3156
2.5445 1000 0.0018 -0.3070
3.0 1179 - -0.3004
3.0534 1200 - -0.3005
3.5623 1400 - -0.2959
3.8168 1500 0.0015 -
4.0 1572 - -0.2907
4.0712 1600 - -0.2924
4.5802 1800 - -0.2863
5.0 1965 - -0.2831
5.0891 2000 0.0013 -0.2841
5.5980 2200 - -0.2792
6.0 2358 - -0.2765
6.1069 2400 - -0.2774
6.3613 2500 0.0012 -
6.6158 2600 - -0.2734
7.0 2751 - -0.2716
7.1247 2800 - -0.2722
7.6336 3000 0.0011 -0.2700
8.0 3144 - -0.2684
8.1425 3200 - -0.2683
8.6514 3400 - -0.2665
8.9059 3500 0.001 -
9.0 3537 - -0.2645
9.1603 3600 - -0.2649
9.6692 3800 - -0.2639
10.0 3930 - -0.2625
10.1781 4000 0.0009 -0.2619
10.6870 4200 - -0.2615
11.0 4323 - -0.2594
11.1959 4400 - -0.2598
11.4504 4500 0.0009 -
11.7048 4600 - -0.2587
12.0 4716 - -0.2582
12.2137 4800 - -0.2586
12.7226 5000 0.0008 -0.2573
13.0 5109 - -0.2568
13.2316 5200 - -0.2567
13.7405 5400 - -0.2564
13.9949 5500 0.0008 -
14.0 5502 - -0.2558
14.2494 5600 - -0.2560
14.7583 5800 - -0.2551
15.0 5895 - -0.2548
15.2672 6000 0.0008 -0.2552
15.7761 6200 - -0.2540
16.0 6288 - -0.2534
16.2850 6400 - -0.2538
16.5394 6500 0.0008 -
16.7939 6600 - -0.2529
17.0 6681 - -0.2532
17.3028 6800 - -0.2530
17.8117 7000 0.0008 -0.2528
18.0 7074 - -0.2525
18.3206 7200 - -0.2527
18.8295 7400 - -0.2521

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.0
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}