---
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3_vl
- trl
- sft
- chemistry
- code
- climate
- art
- biology
- finance
- legal
- music
- medical
- agent
license: apache-2.0
language:
- en
- ab
- aa
- ae
- af
- ak
- am
- an
- ar
- as
- av
- ay
- az
- ba
- be
- bg
- bh
- bi
- bm
- bn
- bo
- br
- bs
- ca
- ce
- ch
- co
- cr
- cs
- cu
- cv
- cy
- da
- de
- dv
- dz
- ee
- el
- eo
- es
- et
- eu
- fa
- ff
- fi
- fj
- fo
- fr
- fy
- ga
- gd
- gl
- gn
- gv
- ha
- he
- hi
- ho
- gu
- hr
- ht
- hu
- hz
- hy
- id
- ia
- ig
- ie
- ik
- ii
- is
- io
- iu
- it
- jv
- ja
- kg
- ka
- kj
- ki
- kl
- kk
- kn
- km
- kr
- ko
- ku
- ks
- kw
- kv
- la
- ky
- lg
- lb
- ln
- li
- lt
- lo
- lv
- lu
- mg
- mi
- mh
- ml
- mk
- mr
- mn
- mt
- ms
- na
- my
- nd
- nb
- ng
- nl
- ne
- 'no'
- nn
- nv
- nr
- oc
- oj
- om
- ny
- os
- or
- pa
- pi
- pl
- ps
- pt
- rm
- rn
- qu
- ro
- ru
- sn
- rw
- so
- sa
- sc
- sd
pipeline_tag: image-text-to-text
library_name: transformers
---
<img src='bannerocr.png'>

# 🖼️ Next OCR 8B

### *Compact OCR AI — Accurate, Fast, Multilingual, Math-Optimized*

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Language: Multilingual](https://img.shields.io/badge/Language-Multilingual-red.svg)]()
[![HuggingFace](https://img.shields.io/badge/🤗-Lamapi/Next--OCR--orange.svg)](https://huggingface.co/Lamapi/next-ocr)

---

## 📖 Overview

**Next OCR 8B** is an **8-billion parameter model** optimized for **optical character recognition (OCR) tasks** with **mathematical and tabular content understanding**.

Supports **multilingual OCR** (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas.

---

## ⚡ Highlights

* 🖼️ Accurate text extraction, including math and tables
* 🌍 Multilingual support (30+ languages)
* ⚡ Lightweight and efficient
* 💬 Instruction-tuned for document understanding and analysis

---

## 📊 Benchmark & Comparison

![image](https://cdn-uploads.huggingface.co/production/uploads/67d46bc5fe6ad6f6511d6f44/wLtEbJ9U3KCJe4OCxvAF7.png)

---

| Model                           | OCR-Bench Accuracy (%)   | Multilingual Accuracy (%) | Layout / Table Understanding (%) |
| ------------------------------- | ------------------------ | ------------------------- | -------------------------------- |
| **Next OCR**                    | **99.0**                 | **96.8**                  | **95.3**                         |
| PaddleOCR                       | 95.2                     | 93.9                      | 95.3                             |
| Deepseek OCR                    | 90.6                     | 87.4                      | 86.1                             |
| Tesseract                       | 92.0                     | 88.4                      | 72.0                             |
| EasyOCR                         | 90.4                     | 84.7                      | 78.9                             |
| Google Cloud Vision / DocAI     | 98.7                     | 95.5                      | 93.6                             |
| Amazon Textract                 | 94.7                     | 86.2                      | 86.1                             |
| Azure Document Intelligence     | 95.1                     | 93.6                      | 91.4                             |

---

| Model                       | Handwriting (%) | Scene Text (%) | Complex Tables (%) | 
| --------------------------- | --------------- | -------------- | ------------------ |
| **Next OCR**                | 92              | 96             | 91                 |
| PaddleOCR                   | 88              | 92             | 90                 |
| Deepseek OCR                | 80              | 85             | 83                 |
| Tesseract                   | 75              | 88             | 70                 |
| EasyOCR                     | 78              | 86             | 75                 |
| Google Cloud Vision / DocAI | 90              | 95             | 92                 |
| Amazon Textract             | 85              | 90             | 88                 |
| Azure Document Intelligence | 87              | 91             | 89                 |

---

## 🚀 Installation & Usage

```python
from transformers import AutoTokenizer, AutoModelForVision2Seq
import torch

model_id = "Lamapi/next-ocr"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.float16)

img = Image.open("image.jpg")

# ATTENTION: The content list must include both an image and text.
messages = [
    {"role": "system", "content": "You are Next-OCR, an helpful AI assistant trained by Lamapi."},
    {
        "role": "user",
        "content": [
            {"type": "image", "image": img},
            {"type": "text", "text": "Read the text in this image and summarize it."}
        ]
    }
]

# Apply the chat template correctly
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)

with torch.no_grad():
    generated = model.generate(**inputs, max_new_tokens=256)

print(processor.decode(generated[0], skip_special_tokens=True))
```

---

## 🧩 Key Features

| Feature                    | Description                                                     |
| -------------------------- | --------------------------------------------------------------- |
| 🖼️ High-Accuracy OCR      | Extracts text from images, documents, and screenshots reliably. |
| 🇹🇷 Multilingual Support  | Works with 30+ languages including Turkish.                     |
| ⚡ Lightweight & Efficient  | Optimized for resource-constrained environments.                |
| 📄 Layout & Math Awareness | Handles tables, forms, and mathematical formulas.               |
| 🏢 Reliable Outputs        | Suitable for enterprise document workflows.                     |

---

## 📐 Model Specifications

| Specification     | Details                                                   |
| ----------------- | --------------------------------------------------------- |
| **Base Model**    | Qwen 3                                                    |
| **Parameters**    | 8 Billion                                                 |
| **Architecture**  | Vision + Transformer (OCR LLM)                            |
| **Modalities**    | Image-to-text                                             |
| **Fine-Tuning**   | OCR datasets with multilingual and math/tabular content   |
| **Optimizations** | Quantization-ready, FP16 support                          |
| **Primary Focus** | Text extraction, document understanding, mathematical OCR |

---

## 🎯 Ideal Use Cases

* Document digitization
* Invoice & receipt processing
* Multilingual OCR pipelines
* Tables, forms, and formulas extraction
* Enterprise document management

---

## 📄 License

MIT License — free for commercial & non-commercial use.

---

## 📞 Contact & Support

* 📧 Email: [lamapicontact@gmail.com](mailto:lamapicontact@gmail.com)
* 🤗 HuggingFace: [Lamapi](https://huggingface.co/Lamapi)

---

> **Next OCR** — Compact *OCR + math-capable* AI, blending **accuracy**, **speed**, and **multilingual document intelligence**.

[![Follow on HuggingFace](https://img.shields.io/badge/Follow-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/Lamapi)