- 
	
	
	
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 - 
	
	
	
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 - 
	
	
	
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 - 
	
	
	
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1 
Collections
Discover the best community collections!
Collections including paper arxiv:2309.01131 
						
					
				- 
	
	
	
DSG: An End-to-End Document Structure Generator
Paper • 2310.09118 • Published • 2 - 
	
	
	
OCR-free Document Understanding Transformer
Paper • 2111.15664 • Published • 4 - 
	
	
	
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
Paper • 2304.12484 • Published • 1 - 
	
	
	
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1 
- 
	
	
	
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 - 
	
	
	
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1 - 
	
	
	
On the Hidden Mystery of OCR in Large Multimodal Models
Paper • 2305.07895 • Published • 1 - 
	
	
	
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 
- 
	
	
	
Random Field Augmentations for Self-Supervised Representation Learning
Paper • 2311.03629 • Published • 10 - 
	
	
	
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 23 - 
	
	
	
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Paper • 2311.04901 • Published • 11 - 
	
	
	
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 28 
- 
	
	
	
UI Layout Generation with LLMs Guided by UI Grammar
Paper • 2310.15455 • Published • 3 - 
	
	
	
You Only Look at Screens: Multimodal Chain-of-Action Agents
Paper • 2309.11436 • Published • 1 - 
	
	
	
Never-ending Learning of User Interfaces
Paper • 2308.08726 • Published • 2 - 
	
	
	
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 
- 
	
	
	
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 - 
	
	
	
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 - 
	
	
	
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 - 
	
	
	
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1 
- 
	
	
	
Random Field Augmentations for Self-Supervised Representation Learning
Paper • 2311.03629 • Published • 10 - 
	
	
	
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 23 - 
	
	
	
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Paper • 2311.04901 • Published • 11 - 
	
	
	
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 28 
- 
	
	
	
DSG: An End-to-End Document Structure Generator
Paper • 2310.09118 • Published • 2 - 
	
	
	
OCR-free Document Understanding Transformer
Paper • 2111.15664 • Published • 4 - 
	
	
	
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
Paper • 2304.12484 • Published • 1 - 
	
	
	
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1 
- 
	
	
	
UI Layout Generation with LLMs Guided by UI Grammar
Paper • 2310.15455 • Published • 3 - 
	
	
	
You Only Look at Screens: Multimodal Chain-of-Action Agents
Paper • 2309.11436 • Published • 1 - 
	
	
	
Never-ending Learning of User Interfaces
Paper • 2308.08726 • Published • 2 - 
	
	
	
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 
- 
	
	
	
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 - 
	
	
	
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1 - 
	
	
	
On the Hidden Mystery of OCR in Large Multimodal Models
Paper • 2305.07895 • Published • 1 - 
	
	
	
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189