PaddlePaddle/PaddleOCR-VL · ms-swift has supported inference, deployment, and fine-tuning of the PaddleOCR-VL model.

ms-swift has supported inference, deployment, and fine-tuning of the PaddleOCR-VL model.

#42

by hu5enpai - opened 2 days ago

Discussion

hu5enpai

2 days ago

•

edited 2 days ago

Install the ms-swift main branch to try it out.

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

Inference

CUDA_VISIBLE_DEVICES=0 swift infer --model PaddlePaddle/PaddleOCR-VL

<image> OCR:
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/ocr.png

SWIFT支持250+ LLM和35+ MLLM（多模态大模型）的训练、推理、评测和部署。开发者可以直接将我们的框架应用到自己的Research和生产环境中，实现模型训练评测到应用的完整链路。我们除支持了PEFT提供的轻量训练方案外，也提供了一个完整的Adapters库以支持最新的训练技术，如NEFTune、LoRA+、LLaMA-PRO等，这个适配器库可以脱离训练脚本直接使用在自己的自定流程中。

为方便不熟悉深度学习的用户使用，我们提供了一个Gradio的web-ui用于控制训练和推理，并提供了配套的深度学习课程和最佳实践供新手入门。

此外，我们也在拓展其他模态的能力，目前我们支持了AnimateDiff的全参数训练和LoRA训练。

SWIFT具有丰富的文档体系，如有使用问题请请查看这里。

可以在Huggingface space 和 ModelScope创空间 中体验SWIFT web-ui功能了。

Fine-tuning

# 12G
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model PaddlePaddle/PaddleOCR-VL \
    --dataset AI-ModelScope/LaTeX_OCR:human_handwrite#2000 \
    --split_dataset_ratio 0.01 \
    --train_type lora \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 2 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --freeze_vit false \
    --freeze_aligner false \
    --gradient_checkpointing true \
    --max_pixels 802816 \
    --gradient_checkpointing_kwargs '{"use_reentrant": false}' \
    --attn_impl flash_attn \
    --gradient_accumulation_steps 16 \
    --logging_steps 5 \
    --max_length 4096 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4

training result

Tingquan

PaddlePaddle org 2 days ago

Thank you for supporting PaddleOCR!

ChengCui

PaddlePaddle org 2 days ago

Great work！

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment