Model Overview

This model is an extended version of the Qwen2.5- 32B-Instruct model, specifically adapted to enhance its performance in Arabic. This extended version focuses on improving fluency, comprehension, and reasoning in Arabic, with particular emphasis on low-resource domains where information is often sparse or underrepresented. The model was further tuned to handle diverse Arabic styles and information, improve factual grounding in regional knowledge, and provide more accurate responses in contexts where existing multilingual models may fall short.

Training Strategy

Instruction Fine-Tuning (IFT):
- Fine-tuned on a mix of Arabic and English instruction–response datasets.
- Covered both high-resource and low-resource domains.
- Included different writing styles to improve adaptability.
Human Alignment:
- Collected human preference data on Arabic and bilingual outputs.
- Applied Direct Preference Optimization (DPO).
- Focused on factual accuracy, safety, and cultural sensitivity.

Usage

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Applied-Innovation-Center/AIC-1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "ما هي عاصمة مصر"
messages = [
    {"role": "system", "content": "You are an AI assistant. Always answer user questions with factual, evidence-based information. If you are unsure or the information is unavailable, clearly state that you do not know instead of guessing. Do not invent details. Keep responses concise, clear, and accurate. Avoid speculation, opinions, or creative storytelling unless explicitly asked for."}
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Downloads last month: 27

Safetensors

Model size

33B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Applied-Innovation-Center/AIC-1

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-32B-Instruct

Finetuned

(1193)

this model