Stefan Schweter's picture

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

liked a dataset about 1 hour ago

HuggingFaceFW/finepdfs-edu

reacted to Locutusque's post with 🔥 about 9 hours ago

🚀 AutoXLA - Accelerating Large Models on TPU AutoXLA is an experimental library that automates the distribution, optimization, and quantization of large language models for TPUs using PyTorch/XLA. It extends the Hugging Face Transformers interface with TPU-aware features such as automatic sharding, custom attention kernels, and quantization-aware loading, making large-scale deployment and training both simpler and faster. With quantization and Splash Attention kernels, AutoXLA achieves up to 4× speedups over standard Flash Attention implementations, significantly improving throughput for both inference and training workloads. Whether you’re experimenting with distributed setups (FSDP, 2D, or 3D sharding) or optimizing memory via LanguageModelQuantizer, AutoXLA is built to make scaling LLMs on TPU seamless. ⚠️ Note: This is an experimental repository. Expect rough edges! Please report bugs or unexpected behavior through GitHub issues. 🔗 GitHub Repository: https://github.com/Locutusque/AutoXLA

reacted to codelion's post with 🔥 about 12 hours ago

Want to experiment with pre-training dataset mixtures but don't want to process terabytes of data? We've got you covered. We're releasing a collection of several carefully curated 1B token dataset samples specifically designed for rapid prototyping and pretraining experiments: https://huggingface.co/collections/codelion/pre-training-dataset-samples These samples were created using reservoir sampling - an algorithm that guarantees statistically unbiased random samples from massive source datasets. This means results you get at the 1B token scale are representative of how these datasets behave at 100B+ token scales, letting you iterate quickly without the computational overhead. The collection includes: - finePDFs-1B: High-quality textbook-style educational content - DCLM-baseline-1B: Filtered, diverse web content - FineWeb-Edu-1B: Curated educational web resources We used these exact samples to run 50+ systematic experiments on dataset mixing strategies, ultimately discovering that a 50-30-20 mixture of finePDFs + DCLM-baseline + FineWeb-Edu achieves 90%+ of GPT-2's performance with just 1/10th the training data. Whether you're researching optimal data mixtures, testing curriculum learning strategies, or just want to quickly prototype a pretraining run, these samples give you a solid foundation to start experimenting immediately. Read the full story of how we used these datasets to find the optimal pretraining recipe: https://huggingface.co/blog/codelion/optimal-dataset-mixing

View all activity

Organizations

stefan-it 's models 1,344

stefan-it/flair-co-funer-gbert_base-bs8-e10-lr3e-05-1

Token Classification • Updated Mar 28, 2024 • 6

stefan-it/flair-co-funer-gbert_base-bs16-e10-lr5e-05-1

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-gbert_base-bs16-e10-lr3e-05-1

Token Classification • Updated Mar 28, 2024 • 5

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-5

Token Classification • Updated Mar 28, 2024 • 34

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr3e-05-5

Token Classification • Updated Mar 28, 2024 • 5

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr5e-05-5

Token Classification • Updated Mar 28, 2024 • 5

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr3e-05-5

Token Classification • Updated Mar 28, 2024 • 8

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-4

Token Classification • Updated Mar 28, 2024 • 4

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr3e-05-4

Token Classification • Updated Mar 28, 2024 • 6

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr5e-05-4

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr3e-05-4

Token Classification • Updated Mar 28, 2024 • 6

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr3e-05-3

Token Classification • Updated Mar 28, 2024 • 6

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-3

Token Classification • Updated Mar 28, 2024 • 5

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr5e-05-3

Token Classification • Updated Mar 28, 2024 • 5

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-2

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr3e-05-3

Token Classification • Updated Mar 28, 2024 • 6

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr3e-05-2

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr3e-05-2

Token Classification • Updated Mar 28, 2024 • 6

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr5e-05-2

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-1

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr3e-05-1

Token Classification • Updated Mar 28, 2024 • 6

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr3e-05-1

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-german_dbmdz_bert_base-bs16-e10-lr5e-05-1

Token Classification • Updated Mar 28, 2024 • 7

stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr5e-05-5

Token Classification • Updated Mar 28, 2024 • 5

stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr3e-05-5

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-german_bert_base-bs16-e10-lr5e-05-5

Token Classification • Updated Mar 28, 2024 • 5

stefan-it/flair-co-funer-german_bert_base-bs16-e10-lr3e-05-5

Token Classification • Updated Mar 28, 2024 • 3

stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr5e-05-4

Token Classification • Updated Mar 28, 2024 • 30

stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr3e-05-4

Token Classification • Updated Mar 28, 2024 • 12

stefan-it/flair-co-funer-german_bert_base-bs16-e10-lr5e-05-4

Token Classification • Updated Mar 28, 2024 • 4