AI & ML interests

None defined yet.

Recent Activity

Articles

sergiopaniegoย 
posted an update 5 days ago
merveย 
posted an update 8 days ago
view post
Post
4474
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
  • 2 replies
ยท
sergiopaniegoย 
posted an update 11 days ago
view post
Post
1855
New drop! ๐Ÿ’ฅ The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.

You can compare how models reason about images ๐Ÿง 

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Letโ€™s gooo

sergiopaniego/vlm_object_understanding
sergiopaniegoย 
posted an update 11 days ago
view post
Post
828
New drop! ๐Ÿ’ฅ The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.



You can compare how models reason about images ๐Ÿง 

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Letโ€™s gooo

sergiopaniego/vlm_object_understanding
sergiopaniegoย 
posted an update 13 days ago
view post
Post
2262
@Qwen released their new small and dense VLMs (Qwen3-VL).

They're incredibly capable and one of my all-time favourite VLMs.

๐Ÿค— Weโ€™ve prepared some resources to help you get started.

> Fine-tune Qwen3-VL-4B with SFT or GRPO (free Colab notebooks):
> SFT: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb
> GRPO: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_qwen3_vl.ipynb

> Compare object detection vs. Moondream3:
sergiopaniego/vlm_object_understanding

> Fine-tune from the CLI using TRL:
https://github.com/kashif/Qwen3-VL/blob/trl-sft/qwen-vl-finetune/README.md#trl-based-training-single-gpu
sergiopaniegoย 
posted an update 18 days ago
view post
Post
1443
Super nice intro to fine-tuning with TRL, just dropped by @google (runs free on Colab)!

They use SFT + QLoRA to fine-tune the tiny Gemma 3 270M model for emoji generation

Hereโ€™s what the fine-tuned model generates for the prompt: โ€œI'm learning to tweetโ€ โ†’ ๐Ÿฆ๐Ÿ—ฃ๐Ÿ’ป

Colab: https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Demos/Emoji-Gemma-on-Web/resources/Fine_tune_Gemma_3_270M_for_emoji_generation.ipynb
Try it out: google/emoji-gemma
Learn more: https://developers.googleblog.com/en/own-your-ai-fine-tune-gemma-3-270m-for-on-device/
sergiopaniegoย 
posted an update 21 days ago
view post
Post
2391
Online training methods (e.g., GRPO) require real-time generation, a compute- and memory-heavy bottleneck.

TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab โšก, scale to multi-GPU/multi-node!

๐Ÿง‘โ€๐Ÿณ recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training
  • 1 reply
ยท
sergiopaniegoย 
posted an update 22 days ago
view post
Post
2880
A few days ago, Thinking Machines Lab released โ€œLoRA Without Regretโ€, showing that LoRA can match full fine-tuning performance when configured right.

Naturally, we decided to reproduce the results with TRL and release a guide!

https://huggingface.co/docs/trl/main/en/lora_without_regret
sergiopaniegoย 
posted an update 27 days ago
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
484
You need to try this tool! ๐Ÿซก

My colleague @Molbap built an interactive HF Space to explore the modular support of open models in transformers over time

๐Ÿ‘€ Youโ€™ll spot things like ๐Ÿฆ™ llama defining many models or which ones could be modular next

Try it: Molbap/transformers-modular-refactor
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
474
How fast can you create an endpoint in Hugging Face Inference Endpoints with a new model + vLLM to deploy a state-of-the-art OCR model?

Letโ€™s break it down step by step.

1๏ธโƒฃ Create your endpoint
Go to Hugging Face Endpoints โ†’ + NEW
Select Deploy from Hub โ†’ rednote-hilab/dots.ocr โ†’ Configure ๐Ÿ› ๏ธ

2๏ธโƒฃ Configure hardware & container
Pick hardware: AWS/GPU/L4 โšก
Set container: vLLM ๐Ÿ‡
Click Create โœ…

3๏ธโƒฃ Update endpoint settings
Container: Container URI: vllm/vllm-openai:nightly โ†’ Update
Advanced: add flag --trust-remote-code โ†’ Update โš ๏ธ

4๏ธโƒฃ Run inference
Download the script ๐Ÿ“: ariG23498/useful-scripts
Set your HF_TOKEN and update base_url in the script.
Run it. โœ…

Your OCR model is now live via HF Inference Endpoints!
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
3452
๐Ÿ’ฅ Tons of new material just landed in the smol-course! ๐Ÿง‘โ€๐Ÿ’ป

> evaluation
> alignment
> VLMs
> quizzes
> assignments!
> certificates!๐Ÿ‘ฉโ€๐ŸŽ“

go learn! ๐Ÿ‘‰ https://huggingface.co/learn/smol-course/unit0/1
  • 1 reply
ยท
merveย 
posted an update about 1 month ago
view post
Post
6581
large AI labs open-sourced a ton of models last week ๐Ÿ”ฅ
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 ๐Ÿค
> IBM released a new Docling model with 258M params based on Granite (A2.0) ๐Ÿ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana ๐ŸŒ (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset ๐Ÿ’ป OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash ๐Ÿ’ญ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
ยท
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
1392
This summer TRL leveled up for multimodal alignment ๐ŸŒž

โœ… New VLM alignment methods (MPO, GRPO, GSPO)
โœ… Extended RLOO & Online DPO for VLMs
โœ… Native SFT support
โœ… Ready-to-use training scripts

๐Ÿ”— https://huggingface.co/blog/trl-vlm-alignment
sergiopaniegoย 
posted an update about 1 month ago
merveย 
posted an update about 1 month ago
view post
Post
3248
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face ๐Ÿ”ฅ

> not only a document converter but also can do document question answering, understand multiple languages ๐Ÿคฏ
> best part: released with Apache 2.0 license ๐Ÿ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! ๐Ÿค—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo ๐Ÿ’—
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
448
Training long-context LLMs is getting easier!

TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly ๐Ÿ’†
Combine TRL and accelerate, and you can run it effortlessly!

With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable.
Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM ๐Ÿ“ˆ

Check out the full guide here ๐Ÿ‘‰ https://huggingface.co/docs/trl/main/en/distributing_training#context-parallelism

If you want to learn more about Context Parallelism, check out the Ultrascale Playbook ๐Ÿ‘‰ nanotron/ultrascale-playbook