vitalune/nanochat-d10-raw-700m
Text Generation
•
Updated
•
9
Two identical d10 models (100M params) trained to validate the hypothesis that quality-filtered data enables more efficient training.