Hala-1.2B-EN-AR-Translator

Hala logo

Paper: Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

Authors: Hasan Abed Al Kader Hammoud*, Mohammad Zbeeb*, Bernard Ghanem

Affiliation: King Abdullah University of Science and Technology (KAUST)

*Equal contribution

📖 Overview

The Hala-1.2B-EN-AR-Translator is a lightweight translation model fine-tuned for English → Arabic translation, particularly in instruction-style and conversational contexts.

It powers the creation of the Hala dataset and can also be used as a standalone translator for research, dataset generation, or preprocessing tasks.

🔧 Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "hammh0a/Hala-1.2B-EN-AR-Translator"

tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

pipe = pipeline("text-generation", model=model, tokenizer=tok)

# Example English text
text = "Physics is the study of matter, energy, and the interactions between them."

messages = [
    {
        "role": "user",
        "content": "Translate everything that follows into Arabic:\n\n" + text,
    }
]

prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

out = pipe(prompt, max_new_tokens=256, do_sample=False)

print(out[0]["generated_text"])

EN→AR Translation Quality on 500 Sampled MMLU Questions

System	BLEU ↑	ROUGE-L ↑	chrF++ ↑
Teacher translator
CohereLabs/command-a-translate-08-2025 (FP16)	53.1	26.0	68.6
hammh0a/command-a-translate-FP8-Dynamic	53.5 (+0.3)	26.0 (+0.0)	68.9 (+0.3)
Lightweight translator (LFM2-1.2B family)
LiquidAI/LFM2-1.2B (base)	16.0	19.3	43.2
Our LFM2-1.2B Translator (ours)	48.2 (+32.1)	25.1 (+5.9)	64.2 (+21.0)

📚 Citation

If you use Hala-1.2B-EN-AR-Translator, please cite:

Link: https://arxiv.org/abs/2509.14008

@misc{hammoud2025halatechnicalreportbuilding,
      title={Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale}, 
      author={Hasan Abed Al Kader Hammoud and Mohammad Zbeeb and Bernard Ghanem},
      year={2025},
      url={https://arxiv.org/abs/2509.14008}, 
}

Downloads last month: 19

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hammh0a/Hala-1.2B-EN-AR-Translator

Base model

LiquidAI/LFM2-1.2B

Finetuned

(36)

this model

Collection including hammh0a/Hala-1.2B-EN-AR-Translator

Hala

Collection

A series of light-weight Arabic language models (instruction following + translation) and Arabic instruction dataset. • 8 items • Updated Sep 18 • 7