Papers
arxiv:2504.20605

TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models

Published on Apr 29
ยท Submitted by Mihai Dan Nadฤƒศ™ on May 2
Authors:
,

Abstract

A new dataset, TF1-EN-3M, uses instruction-tuned models to generate three million English fables following a structured format, evaluated using a combination of automated metrics and human judgments, and released under a permissive license.

AI-generated summary

Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We close this gap with TF1-EN-3M, the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters. Each story follows a six-slot scaffold (character -> trait -> setting -> conflict -> resolution -> moral), produced through a combinatorial prompt engine that guarantees genre fidelity while covering a broad thematic space. A hybrid evaluation pipeline blends (i) a GPT-based critic that scores grammar, creativity, moral clarity, and template adherence with (ii) reference-free diversity and readability metrics. Among ten open-weight candidates, an 8B-parameter Llama-3 variant delivers the best quality-speed trade-off, producing high-scoring fables on a single consumer GPU (<24 GB VRAM) at approximately 13.5 cents per 1,000 fables. We release the dataset, generation code, evaluation scripts, and full metadata under a permissive license, enabling exact reproducibility and cost benchmarking. TF1-EN-3M opens avenues for research in instruction following, narrative intelligence, value alignment, and child-friendly educational AI, demonstrating that large-scale moral storytelling no longer requires proprietary giant models.

Community

Paper author Paper submitter

๐ŸฆŠ๐Ÿ“š Introducing TF1-EN-3M โ€” Three Million Synthetic Moral Fables for Small Open-Weight LLMs

Weโ€™ve just released TF1-EN-3M, the largest open corpus of machine-generated moral fables to date โ€” and it was created entirely with models no larger than 8B parameters. ๐ŸŽ‰

๐Ÿ“„ TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models))


๐ŸŒŸ Why Another Story Dataset?

  • Existing collections such as Aesopโ€™s Fables top out at a few hundred examples โ€” far too small for todayโ€™s data-hungry models.
  • Most educational, on-device, or open-source projects canโ€™t deploy 70B-parameter giants.
  • We asked: Can compact, fully open models (< 8B) generate a massive, high-quality, ethics-focused story corpus that anyone can fine-tune?

๐Ÿ“ฆ Whatโ€™s Inside TF1-EN-3M?

Feature Details
Size 3,000,000 English fables (โ‰ˆ 1B tokens)
Structure Six-slot scaffold: character โ†’ trait โ†’ setting โ†’ conflict โ†’ resolution โ†’ moral
Audience Written for 4โ€“7-year-olds (simple vocabulary, explicit morals)
Metadata Prompt, model name, token counts, latency, GPU type & cost per story
License CC-BY-4.0 โ€” free to remix, filter, or extend

๐Ÿ‘‰ Dataset on the Hub: klusai/ds-tf1-en-3m


๐Ÿค– One-Paragraph Generation Recipe

A combinatorial engine expands six curated lists (100 options each) into millions of unique prompts.
Ten open-weight instruction models (1Bโ€“8B) compete; we score Grammar, Creativity, Moral Clarity, and Prompt Adherence with a gpt-o3-mini critic, plus Self-BLEU & Distinct-1 diversity checks.
LLaMA-3.1-8B-Instruct wins โ€” great quality, tiny VRAM footprint, and costs < $0.0005 per story on an L40S GPU.
All code lives in the public tinyfabulist repo.


๐Ÿ” Quick Quality Peek

  • Mean critic score: 7.8 / 10 (four axes)
  • Age fit: 80% tagged โ€œAge Bโ€ (4โ€“7 yrs)
  • Diversity: Self-BLEU 0.31 โ€ข Distinct-1 0.16
from datasets import load_dataset, disable_caching
disable_caching()
ds = load_dataset("klusai/ds-tf1-en-3m", split="train[:3%]")
print(ds.shuffle(seed=42)[0]["fable"])

๐Ÿ› ๏ธ What Can You Do With It?

  • Fine-tune tiny LMs (1โ€“3B) into bedtime-story generators that run on phones or edge devices.
  • Build moral-inference benchmarks: given a fable, predict its lesson.
  • Train alignment critics to verify kid-safe morals in generated text.
  • Translate the prompt lists and spawn French, Hindi, or Swahili mega-fable sets in a weekend GPU sprint.

Paper: The TF1-EN-3M Synthetic Fables Dataset: Large-Scale Story Generation with Small Open Models
Authors: Mihai Nฤƒdaศ™, Laura Dioศ™an, Andreea Tomescu & Andrei Piศ™coran (KlusAI Labs & Babeศ™-Bolyai University)

Happy storytelling! ๐ŸŽˆ

This is extremely interesting. Alignment through narration, instead of explicitly stated values. I would presume the model can gain a more subtle understanding of "values" in real-world scenarios. Due to the fact these models are driven by procedural knowledge-- maybe this is a more scalable approach to aligning strong AI. very cool.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.20605 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.20605 in a Space README.md to link it from this page.

Collections including this paper 2