TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
Paper
•
2512.20757
•
Published
•
14
Exploring smol models (for text, vision and video) and high quality web and synthetic datasets
HuggingFaceM4/FineVision_full_shuffled