PaddlePaddle/PaddleOCR-VL · How to use transformers for PaddleOCR-VL inferencing?

Oct 16

Excellent work! It would be more convenient if PaddleOCR-VL support transformers-backed inferencing.

PaddlePaddle org Oct 17

Hello, we currently support inference using the PaddleOCR-VL-0.9B model with the transformers library, which can recognize texts, formulas, tables, and chart elements. In the future, we plan to support full document parsing inference with transformers. Below is a simple script we provide to support inference using the PaddleOCR-VL-0.9B model with transformers. We currently recommend using the official method for inference, which is faster and can support page-level document parsing.

If you need any further assistance, feel free to ask!

# -*- coding: utf-8 -*-
"""
This script includes four task prompts (prompts) and allows switching by modifying the CHOSEN_TASK line without any command line parameters.

Available tasks (CHOSEN_TASK):

- 'ocr' -> 'OCR:'
- 'table' -> 'Table Recognition:'
- 'chart' -> 'Chart Recognition:'
- 'formula' -> 'Formula Recognition:'
To add/modify prompts, change the PROMPTS dictionary as needed.
"""

from PIL import Image
import torch
from transformers import AutoModelForCausalLM, AutoProcessor

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

CHOSEN_TASK = "ocr"  # Options: 'ocr' | 'table' | 'chart' | 'formula'
PROMPTS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "chart": "Chart Recognition:",
    "formula": "Formula Recognition:",
}

model_path = "PaddleOCR-VL-0.9B"
image_path = "test.png"
image = Image.open(image_path).convert("RGB")

model = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
).to(DEVICE).eval()
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

messages = [{"role": "user", "content": PROMPTS[CHOSEN_TASK]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(text=[text], images=[image], return_tensors="pt")
inputs = {k: (v.to(DEVICE) if isinstance(v, torch.Tensor) else v) for k, v in inputs.items()}

with torch.inference_mode():
    generated = model.generate(**inputs, max_new_tokens=1024, do_sample=False, use_cache=True)

resp = processor.batch_decode(generated, skip_special_tokens=True)[0]
answer = resp.split(text)[-1].strip()
print(answer)

code-me-running

Oct 17

•

edited Oct 17

model_path = "PaddleOCR-VL-0.9B" is it correct? I changed it to "PaddlePaddle/PaddleOCR-VL" still its not working. Error says model_type is missing from config.

lsyzz changed discussion status to closed Oct 17

lsyzz changed discussion status to open Oct 17

sunflowerting78

PaddlePaddle org Oct 17

model_path = "PaddleOCR-VL-0.9B" is an example, please replace it with your local model path and try again.

code-me-running

Oct 17

Yes. It's working. Thanks for the quick response. I have two more queries
1.Is it possible to parse complete page to markdown or JSON using transformers?
2. I tried using PaddleOCRVL() pipeline, but its not working in CPU only system. How can I set it for CPU only system.

sunflowerting78

PaddlePaddle org Oct 17

Thank you for your interest.

As I mentioned in my previous reply, we do not currently support end-to-end Transformers inference, but we plan to add this support in the future. We recommend that you use the official deployment method for higher inference efficiency.
We do not support CPU inference at this time, as it would lead to a poor user experience.

code-me-running

Oct 17

Using official deployment, can we output the confidence interval or probability of each word?

seinett

Oct 18

I encountered an error:
"""
from transformers.modeling_layers import GradientCheckpointingLayer
ModuleNotFoundError: No module named 'transformers.modeling_layers'
"""
I asked GPT and they told me that the version of Transformers is incorrect. May I know which version I should use

lsyzz

PaddlePaddle org Oct 18

Hello, we’re currently using Transformers version 4.55.0. You may try installing this version if needed.

PrinceZaman

Oct 19

I am really excited

MengNiYa

Oct 20

Hello, which specific method do you recommend for using official deployment? What I currently see are the following:
1、
from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="PaddleOCR-VL")
2、
from paddleocr import PaddleOCRVL
pipeline = PaddleOCRVL()

Then there are VLM acceleration schemes based on both Paddlex and PaddleOCR.
Which deployment plan is recommended? Is PaddleX and PaddleOCR using PaddleOCRVL internally the same?

sunflowerting78

PaddlePaddle org Oct 21

It's the same. You can just use paddleocr.

avaloner

Oct 29

How can I add prompt words to format the output of PaddleOCrvl?

CCWM

Oct 30

could you provide the script for convert table OTSL format to HTML format?
你们可以提供表格识别OTSL到HTML的转换脚本吗？这样可以方便的渲染和评测表格结构识别效果
我是用transformer架构推理表格识别后，发现有是序列，有没有更详细的解释