Model Overview
This model is an extended version of the Qwen2.5- 32B-Instruct model, specifically adapted to enhance its performance in Arabic. This extended version focuses on improving fluency, comprehension, and reasoning in Arabic, with particular emphasis on low-resource domains where information is often sparse or underrepresented. The model was further tuned to handle diverse Arabic styles and information, improve factual grounding in regional knowledge, and provide more accurate responses in contexts where existing multilingual models may fall short.
Training Strategy
- Instruction Fine-Tuning (IFT):
- Fine-tuned on a mix of Arabic and English instruction–response datasets.
- Covered both high-resource and low-resource domains.
- Included different writing styles to improve adaptability.
- Human Alignment:
- Collected human preference data on Arabic and bilingual outputs.
- Applied Direct Preference Optimization (DPO).
- Focused on factual accuracy, safety, and cultural sensitivity.
Usage
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Applied-Innovation-Center/AIC-1"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "ما هي عاصمة مصر"
messages = [
{"role": "system", "content": "You are an AI assistant. Always answer user questions with factual, evidence-based information. If you are unsure or the information is unavailable, clearly state that you do not know instead of guessing. Do not invent details. Keep responses concise, clear, and accurate. Avoid speculation, opinions, or creative storytelling unless explicitly asked for."}
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
- Downloads last month
- 27
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support