LCA Qwen3 ST Fine-Tuned Model

This directory contains a Sentence Transformers v3 model obtained by fine-tuning Qwen/Qwen3-Embedding-0.6B on a proprietary life-cycle assessment (LCA) corpus. It maps sentences and short paragraphs to 1024-dimensional embeddings for tasks such as semantic search, similarity ranking, and clustering.

Model Details

  • Architecture: Transformer encoder + last-token pooling + L2 normalization
  • Max sequence length: 1024 tokens
  • Embedding dimension: 1024
  • Similarity function: Cosine similarity
  • Training objective: MultipleNegativesRankingLoss

Module Stack

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Install the dependency and load the local model directory:

pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BIaoo/lca-qwen3-ft")
queries = [
    "wood residue gasification heat recovery",
    "magnesium alloy diecasting emissions",
]
documents = [
    "Report describing small-scale biomass CHP units used for district heating.",
    "Manufacturing note that summarizes casting emissions for AZ91 components.",
]
query_embs = model.encode(queries, normalize_embeddings=True)
doc_embs = model.encode(documents, normalize_embeddings=True)
scores = (query_embs @ doc_embs.T)
print(scores)

Training Data Overview

  • Pairs: 86,268 (anchor, positive) text pairs
  • Anchor length: short queries (median โ‰ˆ 12 tokens)
  • Positive length: paragraph passages (median โ‰ˆ 480 tokens)
  • Source: Internally curated LCA documents and structured metadata
  • Data release: Individual passages are proprietary and therefore omitted from this README.

Training Configuration

  • Epochs: 2
  • Batch size: 16 (NO_DUPLICATES sampler)
  • Learning rate: 1e-5 with linear warmup (10%)
  • Weight decay: 0.01
  • Precision: bfloat16
  • Gradient checkpointing: disabled (single-GPU run)
  • Seed: 42

Limitations & Notes

  • The model inherits any biases or gaps present in the proprietary LCA corpus.
  • It has been tuned for English technical text; performance may degrade on other languages.
  • While embeddings are normalized, downstream pipelines should still apply task-specific evaluation before deployment.

Files in This Directory

  • config.json, sentence_bert_config.json, modules.json: model definitions
  • model.safetensors: learned weights
  • tokenizer.json, vocab.json, merges.txt, special_tokens_map.json: tokenizer assets
  • 1_Pooling/, 2_Normalize/: Sentence Transformers module metadata
Downloads last month
75
Safetensors
Model size
0.6B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support