view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data Jun 3 β’ 273
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper β’ 2508.18265 β’ Published Aug 25 β’ 202
InternVL3.5 Collection This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). β’ 54 items β’ Updated Sep 28 β’ 102
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper β’ 2409.01704 β’ Published Sep 3, 2024 β’ 83
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated May 5 β’ 237
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper β’ 2402.14905 β’ Published Feb 22, 2024 β’ 134
ReAct: Synergizing Reasoning and Acting in Language Models Paper β’ 2210.03629 β’ Published Oct 6, 2022 β’ 30
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 β’ 189
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper β’ 2404.05719 β’ Published Apr 8, 2024 β’ 83
ProAgent: From Robotic Process Automation to Agentic Process Automation Paper β’ 2311.10751 β’ Published Nov 2, 2023 β’ 10
ReALM: Reference Resolution As Language Modeling Paper β’ 2403.20329 β’ Published Mar 29, 2024 β’ 22
Long-context LLMs Struggle with Long In-context Learning Paper β’ 2404.02060 β’ Published Apr 2, 2024 β’ 37
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. β’ 2 items β’ Updated Mar 30, 2024 β’ 50
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper β’ 2312.00752 β’ Published Dec 1, 2023 β’ 146