MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Paper • 2110.08518 • Published • 2
How to use microsoft/markuplm-base-finetuned-websrc with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("question-answering", model="microsoft/markuplm-base-finetuned-websrc") # Load model directly
from transformers import AutoProcessor, AutoModelForQuestionAnswering
processor = AutoProcessor.from_pretrained("microsoft/markuplm-base-finetuned-websrc")
model = AutoModelForQuestionAnswering.from_pretrained("microsoft/markuplm-base-finetuned-websrc")Multimodal (text +markup language) pre-training for Document AI
MarkupLM is a simple but effective multi-modal pre-training method of text and markup language for visually-rich document understanding and information extraction tasks, such as webpage QA and webpage information extraction. MarkupLM archives the SOTA results on multiple datasets. For more details, please refer to our paper:
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding Junlong Li, Yiheng Xu, Lei Cui, Furu Wei
We refer to the docs and demo notebooks.