Alvaro Bartolome

alvarobartt

https://alvarobartt.com

AI & ML interests

machine learning + tech lead @huggingface (inference + cloud)

Recent Activity

updated a dataset about 14 hours ago

huggingface/DEH-image-scan-data

posted an update 1 day ago

Open agents on AWS SageMaker AI with open models from the Hugging Face Hub! > Deploy an open model from the Hugging Face Hub on SageMaker AI > Connect the deployed model to Strands Agents > Add built-in and custom tools for tool calling > Expose external capabilities through MCP integration > Bonus: talk to your agent and visualize traces with Gradio https://alvarobartt.com/agents-on-aws-sagemaker

upvoted a collection 3 days ago

🧬 Carbon

View all activity

Organizations

updated a dataset about 14 hours ago

huggingface/DEH-image-scan-data

Viewer • Updated about 12 hours ago • 4 • 10k • 13

posted an update 1 day ago

Post

101

Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker

upvoted a collection 3 days ago

🧬 Carbon

Collection

Carbon 500M, 3B, 8B genomic models and GGUF variants for llama.cpp • 6 items • Updated 2 days ago • 28

upvoted an article 3 days ago

Article

Software Forgets: Agent Traces Are the Memory

huggingface

•

4 days ago

• 8

posted an update 5 days ago

Post

3199

Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem

liked a model 5 days ago

Supertone/supertonic-3

Text-to-Speech • Updated 5 days ago • 40.4k • 601

New activity in perplexity-ai/pplx-embed-v1-4b 6 days ago

Can't serve the model using TEI

#12 opened 3 months ago by

TomaszZietkiewicz

New activity in alvarobartt/hf-mem 11 days ago

Update `experimental.png` with MoE breakdown

#2 opened 12 days ago by

alvarobartt

New activity in huggingface/documentation-images 19 days ago

Upload deploy-nemotron-3-nano-omni/nemotron-3-nano-omni-workflow.png

#612 opened 19 days ago by

juanjucm

updated a dataset 19 days ago

alvarobartt/hf-mem

Viewer • Updated 11 days ago • 7 • 966

published a dataset 19 days ago

alvarobartt/hf-mem

Viewer • Updated 11 days ago • 7 • 966

liked a model 25 days ago

openai/privacy-filter

Token Classification • 1B • Updated about 1 month ago • 310k • 1.49k

liked 2 models about 1 month ago

LilaRest/gemma-4-31B-it-NVFP4-turbo

Text Generation • 33B • Updated Apr 10 • 336k • 283

unsloth/Qwen3.6-35B-A3B-GGUF

Image-Text-to-Text • 35B • Updated Apr 20 • 2.36M • 1.1k

upvoted a changelog about 1 month ago

Hugging Face Changelog

Introducing Kernels

Apr 15

• 187

reacted to sergiopaniego's post with 🤗🔥 about 1 month ago

Post

1391

Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

And… it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team 🐎

Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple 🍎

How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed

You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py

One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help

Want to dig deeper?

Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer

updated a dataset about 1 month ago

huggingface/DEH-image-scan-data

Viewer • Updated about 12 hours ago • 4 • 10k • 13

upvoted an article about 1 month ago

Article

Multimodal Embedding & Reranker Models with Sentence Transformers

tomaarsen

•

Apr 9

• 59

updated a dataset about 1 month ago

huggingface/DEH-image-scan-data

Viewer • Updated about 12 hours ago • 4 • 10k • 13

Alvaro Bartolome

AI & ML interests

Recent Activity

Organizations

alvarobartt's activity

Software Forgets: Agent Traces Are the Memory

Can't serve the model using TEI

Update `experimental.png` with MoE breakdown

Upload deploy-nemotron-3-nano-omni/nemotron-3-nano-omni-workflow.png

Introducing Kernels

Multimodal Embedding & Reranker Models with Sentence Transformers