TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Abstract
A new dataset, TF1-EN-3M, uses instruction-tuned models to generate three million English fables following a structured format, evaluated using a combination of automated metrics and human judgments, and released under a permissive license.
Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We close this gap with TF1-EN-3M, the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters. Each story follows a six-slot scaffold (character -> trait -> setting -> conflict -> resolution -> moral), produced through a combinatorial prompt engine that guarantees genre fidelity while covering a broad thematic space. A hybrid evaluation pipeline blends (i) a GPT-based critic that scores grammar, creativity, moral clarity, and template adherence with (ii) reference-free diversity and readability metrics. Among ten open-weight candidates, an 8B-parameter Llama-3 variant delivers the best quality-speed trade-off, producing high-scoring fables on a single consumer GPU (<24 GB VRAM) at approximately 13.5 cents per 1,000 fables. We release the dataset, generation code, evaluation scripts, and full metadata under a permissive license, enabling exact reproducibility and cost benchmarking. TF1-EN-3M opens avenues for research in instruction following, narrative intelligence, value alignment, and child-friendly educational AI, demonstrating that large-scale moral storytelling no longer requires proprietary giant models.
Community
๐ฆ๐ Introducing TF1-EN-3M โ Three Million Synthetic Moral Fables for Small Open-Weight LLMs
Weโve just released TF1-EN-3M, the largest open corpus of machine-generated moral fables to date โ and it was created entirely with models no larger than 8B parameters. ๐
๐ TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models))
๐ Why Another Story Dataset?
- Existing collections such as Aesopโs Fables top out at a few hundred examples โ far too small for todayโs data-hungry models.
- Most educational, on-device, or open-source projects canโt deploy 70B-parameter giants.
- We asked: Can compact, fully open models (< 8B) generate a massive, high-quality, ethics-focused story corpus that anyone can fine-tune?
๐ฆ Whatโs Inside TF1-EN-3M?
| Feature | Details |
|---|---|
| Size | 3,000,000 English fables (โ 1B tokens) |
| Structure | Six-slot scaffold: character โ trait โ setting โ conflict โ resolution โ moral |
| Audience | Written for 4โ7-year-olds (simple vocabulary, explicit morals) |
| Metadata | Prompt, model name, token counts, latency, GPU type & cost per story |
| License | CC-BY-4.0 โ free to remix, filter, or extend |
๐ Dataset on the Hub: klusai/ds-tf1-en-3m
๐ค One-Paragraph Generation Recipe
A combinatorial engine expands six curated lists (100 options each) into millions of unique prompts.
Ten open-weight instruction models (1Bโ8B) compete; we score Grammar, Creativity, Moral Clarity, and Prompt Adherence with a gpt-o3-mini critic, plus Self-BLEU & Distinct-1 diversity checks.
LLaMA-3.1-8B-Instruct wins โ great quality, tiny VRAM footprint, and costs < $0.0005 per story on an L40S GPU.
All code lives in the public tinyfabulist repo.
๐ Quick Quality Peek
- Mean critic score: 7.8 / 10 (four axes)
- Age fit: 80% tagged โAge Bโ (4โ7 yrs)
- Diversity: Self-BLEU 0.31 โข Distinct-1 0.16
from datasets import load_dataset, disable_caching
disable_caching()
ds = load_dataset("klusai/ds-tf1-en-3m", split="train[:3%]")
print(ds.shuffle(seed=42)[0]["fable"])
๐ ๏ธ What Can You Do With It?
- Fine-tune tiny LMs (1โ3B) into bedtime-story generators that run on phones or edge devices.
- Build moral-inference benchmarks: given a fable, predict its lesson.
- Train alignment critics to verify kid-safe morals in generated text.
- Translate the prompt lists and spawn French, Hindi, or Swahili mega-fable sets in a weekend GPU sprint.
Paper: The TF1-EN-3M Synthetic Fables Dataset: Large-Scale Story Generation with Small Open Models
Authors: Mihai Nฤdaศ, Laura Dioศan, Andreea Tomescu & Andrei Piศcoran (KlusAI Labs & Babeศ-Bolyai University)
Happy storytelling! ๐
This is extremely interesting. Alignment through narration, instead of explicitly stated values. I would presume the model can gain a more subtle understanding of "values" in real-world scenarios. Due to the fact these models are driven by procedural knowledge-- maybe this is a more scalable approach to aligning strong AI. very cool.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper