SentenceTransformer based on Qwen/Qwen3-4B
This is a sentence-transformers model finetuned from Qwen/Qwen3-4B on the biomed_retrieval_original dataset. It maps sentences & paragraphs to a 2560-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Qwen/Qwen3-4B
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 2560 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: Qwen3Model
(1): Pooling({'word_embedding_dimension': 2560, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Given a web search query, retrieve relevant passages that answer the query: can you drink coffee when your fasting',
"Fasting for Blood Work: Not if you wish accurate results. Don't have coffee, tea, juice or any sort of fluid apart from water. Do not nibble gum, don't smoke, do not even exercise. Be a monk up until the test is done. As explained by Fasting For Blood Work http://fastingforbloodwork.net/ you can drink coffee as long as there is no sugar involved. But be careful as some black coffee have sugars in them.",
'Rating Newest Oldest. Best Answer: Fasting in Islam differs from fasting in some other religions. Muslims are not allowed to eat or drink anything during the period of fasting that starts before dawn and lasts till dusk. So, NO you cant drink water or any beverage. Yes, itâ\x80\x99s good to drink tea while on a fast. I would stay away from black tea and most green teas as they contain a heavy amount of caffeine. Since you are detoxing, you donâ\x80\x99t want to be adding toxins like caffeine into your system. White teas or any decaf teas will work perfectly! Any night time tea will be good too.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 2560]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
bmretriever - Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.972 |
Training Details
Training Dataset
biomed_retrieval_original
- Dataset: biomed_retrieval_original at 469552f
- Size: 9,000 training samples
- Columns:
anchor,positive, andnegative - Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 16 tokens
- mean: 32.5 tokens
- max: 512 tokens
- min: 2 tokens
- mean: 198.5 tokens
- max: 512 tokens
- min: 3 tokens
- mean: 315.07 tokens
- max: 512 tokens
- Samples:
anchor positive negative Given a web search query, retrieve relevant passages that answer the query: definition of ruminationsRumination (psychology) Rumination is the focused attention on the symptoms of one's distress, and on its possible causes and consequences, as opposed to its solutions. Both rumination and worry are associated with anxiety and other negative emotional states; however, its measures have not been unified. In the Response Styles Theory proposed by Nolen-Hoeksema, rumination is defined as the compulsively focused attention on the symptoms of one's distress, and on its possible causes and consequences, as opposed to its soA hallucination is a perception in the absence of external stimulus that has qualities of real perception. Hallucinations are vivid, substantial, and are seen to be located in external objective space. rumination syndrome or merycism is an under diagnosed chronic motility disorder characterized by effortless regurgitation of most meals following consumption due to the involuntary contraction of the muscles around the abdomenGiven a web search query, retrieve relevant passages that answer the query: what is the reaction between vinegar and eggshellObserve what happens to the egg. You will notice tiny bubbles of carbon dioxide around the eggshell. The carbon dioxide is released during the chemical reaction taking place between the eggshell and the vinegar.When baking soda(sodium bicarbonate) reacts with vinegar, the reac-tion takes heat from the solution, making it feelcooler. This kind of reaction is an example ofan endothermic reaction. An endothermic reaction (en doh THUR mik) is a reaction in which energy isabsorbed. Baking soda and vinegar: Endothermic. 1 In the chemical reaction with baking soda and vinegar, breaking bonds between the atoms in acetic acid (vinegar) requires energy. 2 It also takes energy to break the bonds between the atoms in sodium bicarbonate (baking soda).Given a premise, retrieve hypotheses that are entailed by the premise: It may also have taken time for women to perceive the increased willingness of men to leave them if they demanded marriage.Women thought men were more willing to leave them if they wanted to get married.Women thought men were more willing to leave them if they didn't want to get married. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
biomed_retrieval_original
- Dataset: biomed_retrieval_original at 469552f
- Size: 500 evaluation samples
- Columns:
anchor,positive, andnegative - Approximate statistics based on the first 500 samples:
anchor positive negative type string string string details - min: 16 tokens
- mean: 32.09 tokens
- max: 512 tokens
- min: 3 tokens
- mean: 200.91 tokens
- max: 512 tokens
- min: 4 tokens
- mean: 319.98 tokens
- max: 512 tokens
- Samples:
anchor positive negative Given a web search query, retrieve relevant passages that answer the query: tennessee promise do you have to get a degreePlease note: Bachelorâs degree programs do not meet eligibility requirements for the Tennessee Promise Scholarship. Therefore, you are forfeiting your eligibility for the Tennessee Promise when enrolling in a bachelorâs program.Free tuition. Fall 2017. You might have heard that Tennessee is considering a last-dollar scholarship to adult students beginning fall 2018. At Pellissippi State, we feel you shouldn't have to wait for the opportunity to go to college. Learn more about Reconnect Now at Pellissippi State & enroll today. A few programs at Tennessee Tech have additional admission requirements for admission. If a student meets the general admission requirements, but not the requirements specific this program, the student will be admitted to the Student Success Program (also known as General Curriculum or General Health Studies).Given a question, retrieve relevant Pubmed passages that answer the question: Do radiographic vertebral fractures develop in patients with ankylosing spondylitis during 4 years of TNF-α blocking therapy?To determine the prevalence and incidence of radiographic vertebral fractures in ankylosing spondylitis (AS) patients treated with TNF-α blocking therapy for 4 years and to explore the relationship with patient characteristics, clinical assessments, radiographic damage, and bone mineral density (BMD). This study included consecutive AS patients with active disease from the Groningen Leeuwarden AS (GLAS) cohort treated with TNF-α blocking therapy for 4 years and with available thoracic and lumbar radiographs at baseline and at 4 years. Vertebral fractures were assessed by two readers (mild: ≥20-<25%, moderate: ≥25-<40%, severe: ≥40% reduction in vertebral height). In 27 of 105 (26%) AS patients, radiographic vertebral fractures were observed at baseline. These patients were significantly older, had larger occiput-to-wall distance, and more spinal radiographic damage. During 4 years of TNF-α blocking therapy, 21 (20%) patients developed at least one new fracture. Older age, smoking, high...Radiographic damage is one of the core outcomes in axial SpA and is usually assessed with the modified Stoke Ankylosing Spondylitis (AS) Spine Score (mSASSS). Alternatively, the Radiographic AS Spinal Score (RASSS) is proposed, which includes the lower thoracic vertebrae, under the hypothesis that most progression occurs in these segments. We aimed to compare the mSASSS and RASSS with regard to performance. Two-yearly spinal radiographs from patients followed in the Outcome in AS International Study (OASIS) were used (scored independently by two readers). A total of 195 patients had at least one radiograph (12-year follow-up) to be included. We assessed the accessibility of vertebral corners (VCs) for scoring, as well as status and 2-year progression scores of both scoring methods. To assess the potential additional value of including the thoracic segment in the score, the relative contribution (in %) to the 2-year total RASSS progression of each spinal segment (cervical, thoracic and ...Given a web search query, retrieve relevant passages that answer the query: who is phoebe pricePhoebe Price is an American actress and model. Price is primarily known for her frequent red carpet appearances. Hailing from Alabama, Price worked as a commercial model, mostly in Cape Town, South Africa before beginning an acting career.She has appeared in small roles on The X Files and Arliss.hoebe Price Almost Kills Self to Make TMZ. We've learned some lady named Phoebe Price -- who played photographer in an Arliss episode in 2001 -- is doing well after being involved in a car accident yesterday in Los Angeles.A rep forâ¦.Lea Price. Physical Therapist at New Beginnings. Location Greater New York City Area Industry Health, Wellness and Fitness Manuela Arbelaez Arbeláez (correa Born september, 9) 1988 is A-colombian Born american model and, actress perhaps best known for her work on the television game Show The Price Is. right - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: epochper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 8learning_rate: 0.0001num_train_epochs: 2warmup_steps: 100bf16: Truedataloader_drop_last: Trueoptim: adamw_bnb_8bitddp_find_unused_parameters: Falsegradient_checkpointing: Truegradient_checkpointing_kwargs: {'use_reentrant': False}use_liger_kernel: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 0.0001weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 100log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Truedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_bnb_8bitoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Falseddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Truegradient_checkpointing_kwargs: {'use_reentrant': False}include_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Trueliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | bmretriever_cosine_accuracy |
|---|---|---|---|---|
| 0.7117 | 50 | 1.5593 | - | - |
| 1.0 | 71 | - | 0.1293 | 0.9660 |
| 1.4128 | 100 | 0.0771 | - | - |
| 2.0 | 142 | - | 0.1072 | 0.9720 |
Framework Versions
- Python: 3.11.9
- Sentence Transformers: 4.1.0
- Transformers: 4.57.1
- PyTorch: 2.6.0+cu124
- Accelerate: 1.6.0
- Datasets: 2.21.0
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}