LCA Qwen3 ST Fine-Tuned Model
This directory contains a Sentence Transformers v3 model obtained by fine-tuning
Qwen/Qwen3-Embedding-0.6B on a proprietary life-cycle assessment (LCA) corpus.
It maps sentences and short paragraphs to 1024-dimensional embeddings for tasks
such as semantic search, similarity ranking, and clustering.
Model Details
- Architecture: Transformer encoder + last-token pooling + L2 normalization
- Max sequence length: 1024 tokens
- Embedding dimension: 1024
- Similarity function: Cosine similarity
- Training objective: MultipleNegativesRankingLoss
Module Stack
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
(2): Normalize()
)
Usage
Install the dependency and load the local model directory:
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BIaoo/lca-qwen3-ft")
queries = [
"wood residue gasification heat recovery",
"magnesium alloy diecasting emissions",
]
documents = [
"Report describing small-scale biomass CHP units used for district heating.",
"Manufacturing note that summarizes casting emissions for AZ91 components.",
]
query_embs = model.encode(queries, normalize_embeddings=True)
doc_embs = model.encode(documents, normalize_embeddings=True)
scores = (query_embs @ doc_embs.T)
print(scores)
Training Data Overview
- Pairs: 86,268
(anchor, positive)text pairs - Anchor length: short queries (median โ 12 tokens)
- Positive length: paragraph passages (median โ 480 tokens)
- Source: Internally curated LCA documents and structured metadata
- Data release: Individual passages are proprietary and therefore omitted from this README.
Training Configuration
- Epochs: 2
- Batch size: 16 (NO_DUPLICATES sampler)
- Learning rate: 1e-5 with linear warmup (10%)
- Weight decay: 0.01
- Precision: bfloat16
- Gradient checkpointing: disabled (single-GPU run)
- Seed: 42
Limitations & Notes
- The model inherits any biases or gaps present in the proprietary LCA corpus.
- It has been tuned for English technical text; performance may degrade on other languages.
- While embeddings are normalized, downstream pipelines should still apply task-specific evaluation before deployment.
Files in This Directory
config.json,sentence_bert_config.json,modules.json: model definitionsmodel.safetensors: learned weightstokenizer.json,vocab.json,merges.txt,special_tokens_map.json: tokenizer assets1_Pooling/,2_Normalize/: Sentence Transformers module metadata
- Downloads last month
- 75