Upload folder using huggingface_hub

5259505 verified about 1 year ago

16.3 kB

	---
	library_name: sentence-transformers
	metrics:
	- negative_mse
	pipeline_tag: sentence-similarity
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:25095
	- loss:MSELoss
	widget:
	- source_sentence: mariknak pay ketdi a naabrasaak iti kulonganda
	sentences:
	- Nakuha nako ang usa ka kuptanan sa istorya ug nagsugod kini sa pagbati ug porma
	nga akong gusto
	- 'Ang kasarangang pag-ulan sa London, nga adunay kataas nga 10°C ug ang ubos nga
	6°C. #LondonWeather #RainyDay'
	- Controversial religious text causes uproar among community members
	- source_sentence: "JUAN COLE: Ang Pagduso sa Islamic State sa Baghdad 'Usa ka\
	\ Pagsulay Aron Mabawi ang Gikuha sa Bush Administration' \n"
	sentences:
	- Ang Touchdown nga Selebrasyon ni Antonio Brown Sexy Gihapon Alang sa NFL Bisan
	ang duha ka pagduso makapasilo kanimo.
	- Natuklasan ng mga siyentipiko ang mga bagong species ng nilalang sa malalim na
	dagat
	- i feel so glad doing this
	- source_sentence: New Curriculum Standards to Be Implemented in All Schools Next
	Year
	sentences:
	- "Climate Change This Week: Mega Methane, Tidal Power, and More \n"
	- '@lilomatic Only in Zimbabwe where u find Opposition party for another Opposition
	party.'
	- "Ang mamumuno nga si Mike namulong sa Ferguson: 'Ang Hustisya Dili Kanunay\
	\ Gisilbi' \n"
	- source_sentence: i am so blessed and feel blessed to be able to share my creations
	with you
	sentences:
	- "Ania ang Buhaton Sa World Cup Host Cities Gawas sa Pagtan-aw sa Soccer \n"
	- "Hillary Clinton's 'Super Volunteers' Are Back And Ready For 2016 \n"
	- Awan pay ti koriente para kadagiti paset ti Joburg kalpasan ti uram ti kable iti
	uneg ti daga https://t.co/szuZa380Lr
	- source_sentence: "3 Napateg nga Addang (iti Aniaman nga Edad) tapno Agsagana iti\
	\ Matay \n"
	sentences:
	- EPIC! RAND PAUL Laughs at CNN’s Climate Hysteria…Schools Jake Tapper on Climate
	Truth [Video]
	- im feeling horrible
	- 'Image: WC Provincial Disaster Management Centre https://t.co/EcNgpBhjcV'
	model-index:
	- name: SentenceTransformer
	results:
	- task:
	type: knowledge-distillation
	name: Knowledge Distillation
	dataset:
	name: Unknown
	type: unknown
	metrics:
	- type: negative_mse
	value: -0.2521140966564417
	name: Negative Mse
	---

	# SentenceTransformer

	This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	<!-- - Base model: [Unknown](https://huggingface.co/unknown) -->
	- Maximum Sequence Length: 128 tokens
	- Output Dimensionality: 768 tokens
	- Similarity Function: Cosine Similarity
	<!-- - Training Dataset: Unknown -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("sentence_transformers_model_id")
	# Run inference
	sentences = [
	'3 Napateg nga Addang (iti Aniaman nga Edad) tapno Agsagana iti Matay \n',
	'EPIC! RAND PAUL Laughs at CNN’s Climate Hysteria…Schools Jake Tapper on Climate Truth [Video]',
	'Image: WC Provincial Disaster Management Centre https://t.co/EcNgpBhjcV',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Knowledge Distillation

	* Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)

	\| Metric \| Value \|
	\|:-----------------\|:------------\|
	\| negative_mse \| -0.2521 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### Unnamed Dataset


	* Size: 25,095 training samples
	* Columns: <code>sentence_0</code> and <code>label</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| sentence_0 \| label \|
	\|:--------\|:----------------------------------------------------------------------------------\|:-------------------------------------\|
	\| type \| string \| list \|
	\| details \| <ul><li>min: 4 tokens</li><li>mean: 23.49 tokens</li><li>max: 50 tokens</li></ul> \| <ul><li>size: 768 elements</li></ul> \|
	* Samples:
	\| sentence_0 \| label \|
	\|:------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>A suicide bomber targeting a crowded market resulting in numerous fatalities</code> \| <code>[-0.05337272211909294, -0.296869158744812, -0.005234384443610907, -0.017071111127734184, 0.01954558491706848, ...]</code> \|
	\| <code>Jeb Bush To Meet With Charleston Pastors <br></code> \| <code>[-0.025684779509902, 0.2293000966310501, -0.005389949772506952, 0.09448838979005814, 0.017471183091402054, ...]</code> \|
	\| <code>New scientific research suggests link between air pollution and lung disease</code> \| <code>[-0.12967786192893982, 0.19541345536708832, -0.0044404976069927216, -0.06291326135396957, -0.03776596114039421, ...]</code> \|
	* Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: steps
	- `per_device_train_batch_size`: 64
	- `per_device_eval_batch_size`: 64
	- `num_train_epochs`: 20
	- `multi_dataset_batch_sampler`: round_robin

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: steps
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 64
	- `per_device_eval_batch_size`: 64
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 5e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1
	- `num_train_epochs`: 20
	- `max_steps`: -1
	- `lr_scheduler_type`: linear
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.0
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: False
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: False
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `eval_use_gather_object`: False
	- `batch_sampler`: batch_sampler
	- `multi_dataset_batch_sampler`: round_robin

	</details>

	### Training Logs
	\| Epoch \| Step \| Training Loss \| negative_mse \|
	\|:-------:\|:----:\|:-------------:\|:------------:\|
	\| 0.5089 \| 200 \| - \| -0.3720 \|
	\| 1.0 \| 393 \| - \| -0.3428 \|
	\| 1.0178 \| 400 \| - \| -0.3437 \|
	\| 1.2723 \| 500 \| 0.0024 \| - \|
	\| 1.5267 \| 600 \| - \| -0.3262 \|
	\| 2.0 \| 786 \| - \| -0.3153 \|
	\| 2.0356 \| 800 \| - \| -0.3156 \|
	\| 2.5445 \| 1000 \| 0.0018 \| -0.3070 \|
	\| 3.0 \| 1179 \| - \| -0.3004 \|
	\| 3.0534 \| 1200 \| - \| -0.3005 \|
	\| 3.5623 \| 1400 \| - \| -0.2959 \|
	\| 3.8168 \| 1500 \| 0.0015 \| - \|
	\| 4.0 \| 1572 \| - \| -0.2907 \|
	\| 4.0712 \| 1600 \| - \| -0.2924 \|
	\| 4.5802 \| 1800 \| - \| -0.2863 \|
	\| 5.0 \| 1965 \| - \| -0.2831 \|
	\| 5.0891 \| 2000 \| 0.0013 \| -0.2841 \|
	\| 5.5980 \| 2200 \| - \| -0.2792 \|
	\| 6.0 \| 2358 \| - \| -0.2765 \|
	\| 6.1069 \| 2400 \| - \| -0.2774 \|
	\| 6.3613 \| 2500 \| 0.0012 \| - \|
	\| 6.6158 \| 2600 \| - \| -0.2734 \|
	\| 7.0 \| 2751 \| - \| -0.2716 \|
	\| 7.1247 \| 2800 \| - \| -0.2722 \|
	\| 7.6336 \| 3000 \| 0.0011 \| -0.2700 \|
	\| 8.0 \| 3144 \| - \| -0.2684 \|
	\| 8.1425 \| 3200 \| - \| -0.2683 \|
	\| 8.6514 \| 3400 \| - \| -0.2665 \|
	\| 8.9059 \| 3500 \| 0.001 \| - \|
	\| 9.0 \| 3537 \| - \| -0.2645 \|
	\| 9.1603 \| 3600 \| - \| -0.2649 \|
	\| 9.6692 \| 3800 \| - \| -0.2639 \|
	\| 10.0 \| 3930 \| - \| -0.2625 \|
	\| 10.1781 \| 4000 \| 0.0009 \| -0.2619 \|
	\| 10.6870 \| 4200 \| - \| -0.2615 \|
	\| 11.0 \| 4323 \| - \| -0.2594 \|
	\| 11.1959 \| 4400 \| - \| -0.2598 \|
	\| 11.4504 \| 4500 \| 0.0009 \| - \|
	\| 11.7048 \| 4600 \| - \| -0.2587 \|
	\| 12.0 \| 4716 \| - \| -0.2582 \|
	\| 12.2137 \| 4800 \| - \| -0.2586 \|
	\| 12.7226 \| 5000 \| 0.0008 \| -0.2573 \|
	\| 13.0 \| 5109 \| - \| -0.2568 \|
	\| 13.2316 \| 5200 \| - \| -0.2567 \|
	\| 13.7405 \| 5400 \| - \| -0.2564 \|
	\| 13.9949 \| 5500 \| 0.0008 \| - \|
	\| 14.0 \| 5502 \| - \| -0.2558 \|
	\| 14.2494 \| 5600 \| - \| -0.2560 \|
	\| 14.7583 \| 5800 \| - \| -0.2551 \|
	\| 15.0 \| 5895 \| - \| -0.2548 \|
	\| 15.2672 \| 6000 \| 0.0008 \| -0.2552 \|
	\| 15.7761 \| 6200 \| - \| -0.2540 \|
	\| 16.0 \| 6288 \| - \| -0.2534 \|
	\| 16.2850 \| 6400 \| - \| -0.2538 \|
	\| 16.5394 \| 6500 \| 0.0008 \| - \|
	\| 16.7939 \| 6600 \| - \| -0.2529 \|
	\| 17.0 \| 6681 \| - \| -0.2532 \|
	\| 17.3028 \| 6800 \| - \| -0.2530 \|
	\| 17.8117 \| 7000 \| 0.0008 \| -0.2528 \|
	\| 18.0 \| 7074 \| - \| -0.2525 \|
	\| 18.3206 \| 7200 \| - \| -0.2527 \|
	\| 18.8295 \| 7400 \| - \| -0.2521 \|


	### Framework Versions
	- Python: 3.10.14
	- Sentence Transformers: 3.1.1
	- Transformers: 4.44.2
	- PyTorch: 2.4.0
	- Accelerate: 0.34.2
	- Datasets: 3.0.0
	- Tokenizers: 0.19.1

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MSELoss
	```bibtex
	@inproceedings{reimers-2020-multilingual-sentence-bert,
	title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2020",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/2004.09813",
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->