4 18 42

Jorge Munoz Laredo

jorgemunozl

https://jorgemunozl.github.io

AI & ML interests

I like neural networks, computational quantum mechanics, Arch 🐧, GPU's, piano music, Feynman, Latex, Obsidian.

Recent Activity

new activity about 11 hours ago

mcp-course/unit_1_quiz:Need help: cannot open running spaces

new activity 1 day ago

mcp-course/unit_1_quiz:شكرا Hugging Face

new activity 1 day ago

mcp-course/unit_3_quiz:I've completed the HF mcp course

View all activity

Organizations

New activity in mcp-course/unit_1_quiz about 11 hours ago

Need help: cannot open running spaces

#150 opened 1 day ago by

bekhruzbekswe

New activity in mcp-course/unit_1_quiz 1 day ago

شكرا Hugging Face

🚀 1

#149 opened 2 days ago by

masabi

New activity in mcp-course/unit_3_quiz 1 day ago

I've completed the HF mcp course

🚀 1

#37 opened 2 days ago by

Bachir00

liked a dataset 1 day ago

NONHUMAN-RESEARCH/sarm_dataset_aloha_mobile_wash_pan

Viewer • Updated 7 days ago • 55k • 57 • 1

liked a model 3 days ago

parani01/Fine-tuned-physics-VLM-on-LoRA-and-QLoRA

Image-Text-to-Text • Updated Sep 11 • 4 • 2

New activity in mcp-course/unit_1_quiz 3 days ago

mcp certificate

#148 opened 3 days ago by

Venkatsynd

upvoted an article 8 days ago

Article

LeRobot v0.4.0: Supercharging OSS Robot Learning

24 days ago

•

liked a model 11 days ago

nvidia/NVIDIA-Nemotron-Nano-12B-v2

Text Generation • 12B • Updated 3 days ago • 16.4k • • 122

reacted to codelion's post with 🚀 11 days ago

Post

3583

On this day in 2019, OpenAI released the final GPT-2 model as part of their staged release. I still remember that November well - so much was happening, but GPT-2's release felt like a watershed moment for the field. It showed us what was possible with carefully trained language models.

To recreate some of that GPT-2 magic, I recently tackled an interesting challenge: can you pretrain a language model with just 1 billion tokens - roughly 1/10th of what GPT-2 used - and still get comparable performance? After 50+ systematic experiments testing different dataset mixtures, the answer is yes.

The result is codelion/gpt-2-70m, which achieves over 90% of GPT-2's benchmark performance despite being trained on 10x less data. The key was finding the optimal dataset composition: 50% high-quality textbook PDFs, 30% filtered web content, and 20% educational resources. It even beats GPT-2 on TruthfulQA (47.31% vs 40.69%).

If you're interested in the full story of how we discovered this optimal mixture and why curriculum learning catastrophically failed, check out the complete article: https://huggingface.co/blog/codelion/optimal-dataset-mixing

Sometimes less really is more - when you mix it right.

1 reply

upvoted a paper 12 days ago

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published 26 days ago • 111

reacted to daqc's post with 😎 12 days ago

Post

2834

Just applied for HF Community Grant for “Hugging Research” — a lightweight CodeAgent‑based research assistant built on Hugging Face’s Open Deep Research project for the Hugging Face Hub (models, datasets, Spaces, users, collections, papers). It gathers links via dedicated tools and organizes them for easy review.

As this is for the community, comments and suggestions are appreciated: daqc/hugging-research#1