You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ“š Uzbek Paraphraser - Gemma-based Model Inference Docs

🧠 What It Does

This script runs inference on a LoRA-finetuned Gemma-3B/4B-style language model to generate semantically accurate paraphrases in Uzbek. It supports merged LoRA weights and outputs fluent rewrites using an instruction-tuned prompt format.


πŸ› οΈ Requirements

Python Packages

pip install torch transformers peft

Model Files

You must have a trained + optionally merged model at:

/opt/ai_users/abdurakhim/para/gemma_train/gemma-text-to-paraphrased_v2

Model type: AutoModelForCausalLM


πŸ§ͺ Run the Script

python gemma_paraphraser_infer.py

Output

Generated Answer:
Bu ikki kompaniya 2023-2024 yillarda faoliyat olib borib, lizing va boshqa qulay shartlar asosida avtomobil taqdim qilishni taklif qilgan.

πŸ” Code Structure Explained

1. Load Model + Tokenizer

model_class = AutoModelForCausalLM
model = model_class.from_pretrained(output_model, low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(output_model)
  • Loads the model with optimized memory.
  • Uses tokenizer from same directory.

2. Optional: Merge LoRA (if not done already)

peft_model = PeftModel.from_pretrained(model, output_model)
merged = peft_model.merge_and_unload()
merged.save_pretrained("merged_model")
tokenizer.save_pretrained("merged_model")

Commented out by default β€” run it once to produce a standalone model (merged_model/).


3. Prompt Formatting

user_prompt = """Given the <INPUT_TEXT>, generate a fluent and semantically accurate paraphrase in Uzbek..."""

Then inserted into a chat-style message list:

sample = {
  "messages": [
    {"role": "user", "content": test_sample},
    {"role": "assistant", "content": ""}
  ]
}

Template processed with:

prompt = pipe.tokenizer.apply_chat_template(...)

Supports instruction-tuned models like Gemma.


4. Text Generation Pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
outputs = pipe(prompt, ...)

Generation settings:

  • max_new_tokens=256
  • temperature=0.1 β†’ deterministic output
  • top_k=50, top_p=0.1 β†’ narrow sampling
  • do_sample=False β†’ greedy decoding
  • eos_token_id includes special tokens like <end_of_turn>

πŸ§ͺ Custom Usage

To reuse the paraphrasing in other scripts:

def paraphrase_uz(text: str) -> str:
    prompt = user_prompt.format(question=text)
    messages = [{"role": "user", "content": prompt}, {"role": "assistant", "content": ""}]
    full_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    outputs = pipe(full_prompt, ...)
    return outputs[0]['generated_text'][len(full_prompt):].strip()

πŸ“ Suggested Project Layout

gemma_paraphraser/
β”œβ”€β”€ gemma_paraphraser_infer.py
β”œβ”€β”€ merged_model/                   # Optional: contains merged model weights
└── README.md

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Abduraxim/gemma_paraphraser

Finetuned
(230)
this model

Collection including Abduraxim/gemma_paraphraser