๐ Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v2
๐ฅ The first open-source Arabic model to achieve 97.2% accuracy in extracting Arabic text from historical books and manuscripts.
Outperforms Google Vision and Teseract in the Arabic context.
This model is the highest-performing open-source Arabic model ever for handwriting and manuscripts, trained on 65,747 diverse samples, including:
Printed texts (from sources such as a font)
Handwriting (from a font collection)
Historical manuscripts (rare archival documents)
๐ Performance
Scale Value Explanation Evaluation Loss 0.6564, the lowest globally in Arabic OCR Character Error Rate (CER) 4.51%, excellent (less than 5% = high quality) Word Error Rate (WER) ~9%, very good for handwritten texts Estimated Accuracy 97.2%, outperforms commercial models Average Time 0.30-second inference, fast for live applications
๐ก Unique Features
โ Supports full text (not isolated characters)
โ Handles old manuscripts (even low quality)
โ No complex preprocessing required
โ Fully open source - customizable
โ Supports full linguistic context for comprehensive text understanding
โ Noise-resistant and handles low-quality images
๐ High efficiency: The 4-bit quantum model improves memory usage by about 50% and only 2% less in terms of text extraction compared to the basic model.
๐๏ธ Datasets Used
๐ Public Data
Aamijar/muharaf-public - Various printed texts
KHATT Dataset - Arabic handwriting in various styles
Arabic-OCR-images - Comprehensive OCR dataset
Rasam (johnlockejrr/RASAM) - Modern Moroccan Arabic handwriting
Khatt v1.0 (johnlockejrr/KHATT_v1.0_dataset) - Handwriting dataset Arabic
Additional historical manuscripts and documents
๐ Competitive Comparison
Model Accuracy in Arabic,
Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v2, 97.2% โกโกโกโก ๐ Free
Google Cloud Vision ~94% โกโกโก ๐ฐ Paid
Microsoft Azure OCR ~92% โกโกโก ๐ฐ Paid
Tesrakt Arabic Printed lines ~90% โกโก ๐ Free
Easy OCR Arabic Printed lines ~88% โกโกโก ๐ Free
Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1 ~77% on manuscripts, ~94% on printed documents โกโกโก ๐ Free
๐ Performance on Different Text Types
Text Type: Predicted Copy Rate (CER) Predicted Copy Rate (WER) Notes High-Quality Printed Text: Excellent Performance: Approximately 1-5% Clear Handwriting: 4-7% - Very Good Historical Manuscripts: 5-15% - Acceptable for Heritage Low-Quality Documents: 10-20%, 20-35%, Needs Improvement
๐ฏ Technical Specifications
Model Architecture
Basic Model: Qwen2.5-VL-3B
Fine-tuning: Advanced fine-tuning for full contextual understanding of Arabic texts Connected.
Special considerations: Connection points, diacritics, and script variety.
Supported scripts
Various Arabic scripts (Naskh, Ruq'ah, Modern Maghrebi... Persian numerals are unreadable ; some Arabic scripts, such as Diwani and Thuluth, have not been trained on 50% of the time).
Multiple fields (literary, scientific, historical, religious).
๐ฎ Coming soon
Support for more Arabic scripts, such as Diwani and Thuluth.
User-friendly web interface.
Model and experience changes based on the Qwen3-VL-4B-Instruct
Enhanced training on rare historical scripts.
๐ค How can you help?
Use the form and give us your feedback.
Send us examples of difficult scripts you encounter.
Help improve the training data.
Contribute to the development of the form.
Visualizations
![]() |
![]() |
![]() |
![]() |
๐ ๏ธ How to use it.
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
from typing import List, Dict
import os
def process_vision_info(messages: List[dict]):
image_inputs = []
video_inputs = []
for message in messages:
if isinstance(message["content"], list):
for item in message["content"]:
if item["type"] == "image":
image = item["image"]
if isinstance(image, str):
# Open image with quality improvement
image = Image.open(image).convert("RGB")
elif isinstance(image, Image.Image):
pass
else:
raise ValueError(f"Unsupported image type: {type(image)}")
image_inputs.append(image)
elif item["type"] == "video":
video_inputs.append(item["video"])
return image_inputs if image_inputs else None, video_inputs if video_inputs else None
model_name = "sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v2"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_name,
dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
model_name,
trust_remote_code=True
)
def extract_text_from_image(image_path):
try:
# โ
Use clearer prompt that requests the complete text
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": "ุงุฑุฌู ุงุณุชุฎุฑุงุฌ ุงููุต ุงูุนุฑุจู ูุงู
ูุงู ู
ู ูุฐู ุงูุตูุฑุฉ ู
ู ุงูุจุฏุงูุฉ ุงูู ุงูููุงูุฉ ุจุฏูู ุงู ุงุฎุชุตุงุฑ ูุฏูู ุฐูุงุฏุฉ ุงู ุญุฐู. ุงูุฑุฃ ูู ุงูู
ุญุชูู ุงููุตู ุงูู
ูุฌูุฏ ูู ุงูุตูุฑุฉ:"},
],
}
]
# Prepare text and images
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
# Process inputs with improved settings
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to(model.device)
# โ
Improved generation settings for long texts
generated_ids = model.generate(
**inputs,
max_new_tokens=512, # Significant increase to accommodate long texts
min_new_tokens=50, # Minimum to ensure no premature truncation
do_sample=False, # For consistent results
temperature=0.3, # Balance between creativity and stability
top_p=0.9, # For moderate diversity
repetition_penalty=1.1, # Prevent repetition
pad_token_id=processor.tokenizer.eos_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
num_return_sequences=1
)
# Extract only the generated text (without user prompt)
input_len = inputs.input_ids.shape[1]
output_text = processor.batch_decode(
generated_ids[:, input_len:],
skip_special_tokens=True,
clean_up_tokenization_spaces=True # Improve spacing
)[0]
return output_text.strip()
except Exception as e:
return f"Error occurred while processing image: {str(e)}"
def enhance_image_quality(image_path):
"""Enhance image quality to improve OCR accuracy"""
try:
img = Image.open(image_path)
# Increase resolution if image is small
if max(img.size) < 800:
new_size = (img.size[0] * 2, img.size[1] * 2)
img = img.resize(new_size, Image.Resampling.LANCZOS)
return img
except:
return Image.open(image_path)
if __name__ == "__main__":
TEST_IMAGES_DIR = "/media/imges" # Replace with your folder image path
IMAGE_EXTENSIONS = ['.png', '.jpg', '.jpeg', '.tif', '.tiff']
image_files = [
os.path.join(TEST_IMAGES_DIR, f)
for f in os.listdir(TEST_IMAGES_DIR)
if any(f.lower().endswith(ext) for ext in IMAGE_EXTENSIONS)
]
if not image_files:
print("โ No images found in the folder.")
exit()
print(f"๐ Found {len(image_files)} images for processing")
for img_path in sorted(image_files):
print(f"\n{'='*50}")
print(f"๐ผ๏ธ Processing: {os.path.basename(img_path)}")
print(f"{'='*50}")
try:
# โ
Use the enhanced function
extracted_text = extract_text_from_image(img_path)
print("๐ Extracted text:")
print("-" * 40)
print(extracted_text)
print("-" * 40)
# โ
Calculate text length for comparison
text_length = len(extracted_text)
print(f"๐ Text length: {text_length} characters")
except Exception as e:
print(f"โ Error processing {os.path.basename(img_path)}: {e}")
- Downloads last month
- 692
Model tree for sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v2
Base model
Qwen/Qwen2.5-VL-3B-Instruct


