You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

📚 Uzbek Paraphraser - Gemma-based Model Inference Docs

🧠 What It Does

This script runs inference on a LoRA-finetuned Gemma-3B/4B-style language model to generate semantically accurate paraphrases in Uzbek. It supports merged LoRA weights and outputs fluent rewrites using an instruction-tuned prompt format.

🛠️ Requirements

Python Packages

pip install torch transformers peft

Model Files

You must have a trained + optionally merged model at:

/opt/ai_users/abdurakhim/para/gemma_train/gemma-text-to-paraphrased_v2

Model type: AutoModelForCausalLM

🧪 Run the Script

python gemma_paraphraser_infer.py

Output

Generated Answer:
Bu ikki kompaniya 2023-2024 yillarda faoliyat olib borib, lizing va boshqa qulay shartlar asosida avtomobil taqdim qilishni taklif qilgan.

🔍 Code Structure Explained

1. Load Model + Tokenizer

model_class = AutoModelForCausalLM
model = model_class.from_pretrained(output_model, low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(output_model)

Loads the model with optimized memory.
Uses tokenizer from same directory.

2. Optional: Merge LoRA (if not done already)

peft_model = PeftModel.from_pretrained(model, output_model)
merged = peft_model.merge_and_unload()
merged.save_pretrained("merged_model")
tokenizer.save_pretrained("merged_model")

Commented out by default — run it once to produce a standalone model (merged_model/).

3. Prompt Formatting

user_prompt = """Given the <INPUT_TEXT>, generate a fluent and semantically accurate paraphrase in Uzbek..."""

Then inserted into a chat-style message list:

sample = {
  "messages": [
    {"role": "user", "content": test_sample},
    {"role": "assistant", "content": ""}
  ]
}

Template processed with:

prompt = pipe.tokenizer.apply_chat_template(...)

Supports instruction-tuned models like Gemma.

4. Text Generation Pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
outputs = pipe(prompt, ...)

Generation settings:

max_new_tokens=256
temperature=0.1 → deterministic output
top_k=50, top_p=0.1 → narrow sampling
do_sample=False → greedy decoding
eos_token_id includes special tokens like <end_of_turn>

🧪 Custom Usage

To reuse the paraphrasing in other scripts:

def paraphrase_uz(text: str) -> str:
    prompt = user_prompt.format(question=text)
    messages = [{"role": "user", "content": prompt}, {"role": "assistant", "content": ""}]
    full_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    outputs = pipe(full_prompt, ...)
    return outputs[0]['generated_text'][len(full_prompt):].strip()

📁 Suggested Project Layout

gemma_paraphraser/
├── gemma_paraphraser_infer.py
├── merged_model/                   # Optional: contains merged model weights
└── README.md

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Abduraxim/gemma_paraphraser

Base model

google/gemma-3-4b-pt

Finetuned

(230)

this model

Collection including Abduraxim/gemma_paraphraser

gemma_trained

Collection

1 item • Updated Aug 17