Hala
Collection
A series of light-weight Arabic language models (instruction following + translation) and Arabic instruction dataset.
β’
8 items
β’
Updated
β’
7
Paper: Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale
Authors: Hasan Abed Al Kader Hammoud*, Mohammad Zbeeb*, Bernard Ghanem
Affiliation: King Abdullah University of Science and Technology (KAUST)
*Equal contribution
The Hala-1.2B-EN-AR-Translator is a lightweight translation model fine-tuned for English β Arabic translation, particularly in instruction-style and conversational contexts.
It powers the creation of the Hala dataset and can also be used as a standalone translator for research, dataset generation, or preprocessing tasks.
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "hammh0a/Hala-1.2B-EN-AR-Translator"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype="auto", device_map="auto"
)
pipe = pipeline("text-generation", model=model, tokenizer=tok)
# Example English text
text = "Physics is the study of matter, energy, and the interactions between them."
messages = [
{
"role": "user",
"content": "Translate everything that follows into Arabic:\n\n" + text,
}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
out = pipe(prompt, max_new_tokens=256, do_sample=False)
print(out[0]["generated_text"])
| System | BLEU β | ROUGE-L β | chrF++ β |
|---|---|---|---|
| Teacher translator | |||
| CohereLabs/command-a-translate-08-2025 (FP16) | 53.1 | 26.0 | 68.6 |
| hammh0a/command-a-translate-FP8-Dynamic | 53.5 (+0.3) | 26.0 (+0.0) | 68.9 (+0.3) |
| Lightweight translator (LFM2-1.2B family) | |||
| LiquidAI/LFM2-1.2B (base) | 16.0 | 19.3 | 43.2 |
| Our LFM2-1.2B Translator (ours) | 48.2 (+32.1) | 25.1 (+5.9) | 64.2 (+21.0) |
If you use Hala-1.2B-EN-AR-Translator, please cite:
Link: https://arxiv.org/abs/2509.14008
@misc{hammoud2025halatechnicalreportbuilding,
title={Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale},
author={Hasan Abed Al Kader Hammoud and Mohammad Zbeeb and Bernard Ghanem},
year={2025},
url={https://arxiv.org/abs/2509.14008},
}
Base model
LiquidAI/LFM2-1.2B