Hala-1.2B-EN-AR-Translator

Hala logo

Paper: Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

Authors: Hasan Abed Al Kader Hammoud*, Mohammad Zbeeb*, Bernard Ghanem

Affiliation: King Abdullah University of Science and Technology (KAUST)

*Equal contribution


πŸ“– Overview

The Hala-1.2B-EN-AR-Translator is a lightweight translation model fine-tuned for English β†’ Arabic translation, particularly in instruction-style and conversational contexts.

It powers the creation of the Hala dataset and can also be used as a standalone translator for research, dataset generation, or preprocessing tasks.


πŸ”§ Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "hammh0a/Hala-1.2B-EN-AR-Translator"

tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

pipe = pipeline("text-generation", model=model, tokenizer=tok)

# Example English text
text = "Physics is the study of matter, energy, and the interactions between them."

messages = [
    {
        "role": "user",
        "content": "Translate everything that follows into Arabic:\n\n" + text,
    }
]

prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

out = pipe(prompt, max_new_tokens=256, do_sample=False)

print(out[0]["generated_text"])

EN→AR Translation Quality on 500 Sampled MMLU Questions

System BLEU ↑ ROUGE-L ↑ chrF++ ↑
Teacher translator
CohereLabs/command-a-translate-08-2025 (FP16) 53.1 26.0 68.6
hammh0a/command-a-translate-FP8-Dynamic 53.5 (+0.3) 26.0 (+0.0) 68.9 (+0.3)
Lightweight translator (LFM2-1.2B family)
LiquidAI/LFM2-1.2B (base) 16.0 19.3 43.2
Our LFM2-1.2B Translator (ours) 48.2 (+32.1) 25.1 (+5.9) 64.2 (+21.0)

πŸ“š Citation

If you use Hala-1.2B-EN-AR-Translator, please cite:

Link: https://arxiv.org/abs/2509.14008

@misc{hammoud2025halatechnicalreportbuilding,
      title={Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale}, 
      author={Hasan Abed Al Kader Hammoud and Mohammad Zbeeb and Bernard Ghanem},
      year={2025},
      url={https://arxiv.org/abs/2509.14008}, 
}
Downloads last month
19
Safetensors
Model size
1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for hammh0a/Hala-1.2B-EN-AR-Translator

Base model

LiquidAI/LFM2-1.2B
Finetuned
(36)
this model

Collection including hammh0a/Hala-1.2B-EN-AR-Translator