deepseek-ai/DeepSeek-OCR is out! ๐ฅ my take โคต๏ธ > pretty insane it can parse and re-render charts in HTML > it uses CLIP and SAM features concatenated, so better grounding > very efficient per vision tokens/performance ratio > covers 100 languages
Online training methods (e.g., GRPO) require real-time generation, a compute- and memory-heavy bottleneck.
TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab โก, scale to multi-GPU/multi-node!
A few days ago, Thinking Machines Lab released โLoRA Without Regretโ, showing that LoRA can match full fine-tuning performance when configured right.
Naturally, we decided to reproduce the results with TRL and release a guide!
Want to deploy open models using vLLM as the inference engine? We just released a step-by-step guide on how to do it with @huggingface Inference Endpoints, now available in the vLLM docs.
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face ๐ฅ
> not only a document converter but also can do document question answering, understand multiple languages ๐คฏ > best part: released with Apache 2.0 license ๐ use it with your commercial projects! > it supports transformers, vLLM and MLX from the get-go! ๐ค > built on SigLIP2 & granite-165M
TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly ๐ Combine TRL and accelerate, and you can run it effortlessly!
With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable. Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM ๐