Petite Elle L'Aime 3

Model Average BBH-fr GPQA-fr IFEval-fr MUSR-fr MATH Lvl5-fr MMMLU-fr
HuggingFaceTB/SmolLM3-3B 20.83% 24.30% 11.18% 12.76% 6.64% 24.96% 45.11%
Tonic/petite-elle-L-aime-3-sft 18.99% 19.06% 10.29% 11.73% 6.74% 21.06% 45.05%

Language Understanding Tasks

  • Sports Comprehension: petite-elle-L-aime-3-sft achieves 57.6% accuracy, outperforming SmolLM3 by 3.6 percentage points
  • Causal Judgment: Strong performance at 57.2% accuracy, slightly better than SmolLM3
  • Hyperbate (Adjective Ordering): petite-elle-L-aime-3-sft shows superior performance at 53.6% vs 52.8%

Mathematical Reasoning

  • Prealgebra: petite-elle-L-aime-3-sft's strongest math performance at 48.7% accuracy, outperforming SmolLM3 by 0.9 percentage points
  • Geometry: Matches SmolLM3's performance exactly at 11.7%

Question Answering

  • GPQA Diamond: petite-elle-L-aime-3-sft performs slightly better at 34.0% vs 33.5%
  • Murder Mysteries: petite-elle-L-aime-3-sft shows competitive performance at 50.4% vs 49.6%

Object Tracking

  • 5-Object Tracking: petite-elle-L-aime-3-sft performs better at 18.0% vs 16.0% accuracy
  • This suggests petite-elle-L-aime-3-sft may have better spatial reasoning for moderate complexity tasks

petite-elle-L-aime-3-sft's Performance Profile

Areas of Excellence

  1. Sports Domain Knowledge: petite-elle-L-aime-3-sft demonstrates superior understanding of sports-related content
  2. Causal Reasoning: Strong performance in understanding cause-and-effect relationships
  3. Adjective Ordering: Better grasp of linguistic rules for adjective placement
  4. Basic Mathematical Operations: Competitive performance in prealgebra tasks
  5. Spatial Reasoning: Better performance in moderate complexity object tracking

Competitive Performance

  • MMLU: Nearly identical performance (50.55% vs 50.60%)
  • Instruction Following: Similar performance on strict evaluation metrics
  • Geometric Recognition: Matches SmolLM3's performance exactly
Task Category Specific Task petite-elle-L-aime-3-sft SmolLM3 Difference Winner
MMLU (French) Overall Accuracy 50.55% 50.60% +0.05% SmolLM3
BBH (French)
Compréhension de la date 39.2% 52.4% +13.2% SmolLM3
Compréhension des sports 57.6% 54.0% -3.6% petite-elle-L-aime-3-sft
Comptage d'objets 48.0% 50.4% +2.4% SmolLM3
Déduction logique (3 objets) 60.4% 68.0% +7.6% SmolLM3
Déduction logique (5 objets) 39.6% 46.8% +7.2% SmolLM3
Déduction logique (7 objets) 28.0% 39.6% +11.6% SmolLM3
Désambiguïsation QA 34.8% 56.0% +21.2% SmolLM3
Expressions booléennes 44.8% 53.6% +8.8% SmolLM3
Formes géométriques 35.6% 34.4% -1.2% petite-elle-L-aime-3-sft
Hyperbate 53.6% 52.8% -0.8% petite-elle-L-aime-3-sft
Jugement causal 57.2% 56.1% -1.1% petite-elle-L-aime-3-sft
Naviguer 58.4% 64.0% +5.6% SmolLM3
Pingouins sur une table 47.9% 50.7% +2.8% SmolLM3
Raisonnement sur les objets colorés 41.6% 48.8% +7.2% SmolLM3
Recommandation de film 39.2% 58.8% +19.6% SmolLM3
Sarcasmes 59.0% 62.9% +3.9% SmolLM3
Sophismes formels 52.8% 54.0% +1.2% SmolLM3
Suivi objets mélangés (3 objets) 33.6% 34.4% +0.8% SmolLM3
Suivi objets mélangés (5 objets) 18.0% 16.0% -2.0% petite-elle-L-aime-3-sft
Suivi objets mélangés (7 objets) 12.8% 15.6% +2.8% SmolLM3
Séquences temporelles 40.8% 44.4% +3.6% SmolLM3
Toile de mensonges 51.2% 51.2% 0.0% Tie
GPQA (French)
Diamond 34.0% 33.5% -0.5% petite-elle-L-aime-3-sft
Extended 34.1% 34.3% +0.2% SmolLM3
Main 30.1% 32.4% +2.3% SmolLM3
Math (French)
Algebra 35.7% 44.9% +9.2% SmolLM3
Counting & Probability 6.1% 11.7% +5.6% SmolLM3
Geometry 11.7% 11.7% 0.0% Tie
Number Theory 17.1% 23.0% +5.9% SmolLM3
Prealgebra 48.7% 47.8% -0.9% petite-elle-L-aime-3-sft
Precalculus 7.1% 10.7% +3.6% SmolLM3
IFEval (French)
Prompt Level Strict 2.5% 2.5% 0.0% Tie
Instruction Level Strict 20.9% 23.0% +2.1% SmolLM3
Prompt Level Loose 3.7% 3.5% -0.2% petite-elle-L-aime-3-sft
Instruction Level Loose 20.7% 24.4% +3.7% SmolLM3
MUSR (French)
Murder Mysteries 50.4% 49.6% -0.8% petite-elle-L-aime-3-sft
Object Placements 35.5% 35.9% +0.4% SmolLM3
Team Allocation 20.8% 23.6% +2.8% SmolLM3

Model Architecture

This is a fine-tuned version of the SmolLM3-3B model with the following specifications:

  • Base Model: SmolLM3-3B
  • Parameters: ~3B
  • Context Length: 128k
  • Languages: English, French , Italian , Portugese , Chinese , Arabic
  • Architecture: Transformer-based causal language model

Performance

The model provides:

  • Text Generation: High-quality text generation capabilities
  • Conversation: Natural conversation abilities
  • Multilingual: Support for English and French
  • Quantized Versions: Optimized for different deployment scenarios

Limitations

  1. Context Length: Limited by the model's maximum sequence length
  2. Bias: May inherit biases from the training data
  3. Factual Accuracy: May generate incorrect or outdated information
  4. Safety: Should be used responsibly with appropriate safeguards
  5. Quantization: Quantized versions may have slightly reduced accuracy

Usage

Requirements

pip install torch transformers accelerate
pip install torchao  # For quantized models

Hardware Requirements

  • Main Model: GPU with 8GB+ VRAM recommended
  • int4 Model: CPU deployment possible

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="None", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the main model
model = AutoModelForCausalLM.from_pretrained(
    "Tonic/petite-elle-L-aime-3-sft",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("Tonic/petite-elle-L-aime-3-sft")

# Generate text
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt").to(model.device.type)
output = model.generate(**input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Quantized Models

This repository also includes quantized versions of the model for improved efficiency:

int4 Weight-Only Quantization (CPU Optimized)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load int4 quantized model (CPU optimized)
model = AutoModelForCausalLM.from_pretrained(
    "Tonic/petite-elle-L-aime-3-sft/int4",
    device_map="cpu",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("Tonic/petite-elle-L-aime-3-sft/int4")

Quantization Benefits

  • int4 (CPU): ~50% memory reduction, significantly faster inference with some accuracy trade-off

Training Information

Training Configuration

  • Base Model: HuggingFaceTB/SmolLM3-3B
  • Dataset: legmlai/openhermes-fr
  • Training Config: SFT
  • Trainer Type: TRL
  • Dataset Sample Size: 800K

Training Parameters

  • Batch Size: 12
  • Gradient Accumulation: 12
  • Learning Rate: 4e-6
  • Max Epochs: 1.6
  • Sequence Length: 16384

Training Infrastructure

  • Hardware: A100 SXM
  • Monitoring: Track Tonic
  • Experiment: exp_20250727_172526

Monitoring and Tracking

This model was trained with custom monitoring:

Training Data

The model was fine-tuned on:

  • Dataset: legmlai/openhermes-fr
  • Size: 800K
  • Format: DPO
  • Languages: French

Evaluation

The model was evaluated using:

Framework versions

  • TRL: 0.19.1
  • Transformers: 4.54.0
  • Pytorch: 2.7.1+cu118
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Acknowledgments

  • Base Model: SmolLM3-3B by HuggingFaceTB
  • Training Framework: PyTorch, Transformers, PEFT
  • Monitoring: Trackio integration
  • Quantization: torchao library

Cite

If you use this model in your research, please cite:

@misc{Petite_Elle_L_Aime_3_SFT,
  title={{Petite Elle L'Aime 3}},
  author={{Joseph Pollack}},
  year={2025},
  url={https://huggingface.co/Tonic/petite-elle-L-aime-3-sft}
}

Citations

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

License

This model is licensed under the Apache 2.0 License.

Changelog

  • v1.0.0: Initial release with fine-tuned model
  • v1.1.0: Added quantized versions (int4)

Repository Structure

petite-elle-L-aime-3-sft/
├── .gitattributes                         # 1.96 kB, Git configuration for file handling
├── README.md                              # 7.38 kB, Project overview and instructions (updated 7 minutes ago)
├── chat_template.jinja                    # 5.6 kB, Chat template
├── config.json                            # 1.92 kB, Model configuration
├── generation_config.json                 # 177 Bytes, Generation settings
├── model-00001-of-00002.safetensors       # 4.97 GB, Model weights (part 1)
├── model-00002-of-00002.safetensors       # 1.18 GB, Model weights (part 2)
├── model.safetensors.index.json           # 26.9 kB, Model index
├── pytorch_model.bin                      # Main model weights
├── special_tokens_map.json                # 289 Bytes, Special tokens
├── tokenizer.json                         # 17.2 MB, Tokenizer data
├── tokenizer_config.json                  # 50.4 kB, Tokenizer configuration
├── train_results.json                     # 182 Bytes, Training results
├── training_args.bin                      # 6.16 kB, Training arguments
├── training_results/                      # Training results directory
├── checkpoint-4000/                       # Training checkpoint at step 4000
├── checkpoint-5000/                       # Training checkpoint at step 5000
├── checkpoint-6000/                       # Training checkpoint at step 6000
├── checkpoint-7000/                       # Training checkpoint at step 7000
├── checkpoint-8000/                       # Training checkpoint at step 8000
│   ├── chat_template.jinja                # 5.6 kB, Chat template
│   ├── config.json                        # 1.92 kB, Checkpoint configuration
│   ├── generation_config.json             # 177 Bytes, Generation settings
│   ├── model-00001-of-00002.safetensors   # 4.97 GB, Model weights (part 1)
│   ├── model-00002-of-00002.safetensors   # 1.18 GB, Model weights (part 2)
│   ├── model.safetensors.index.json       # 26.9 kB, Model index
│   ├── optimizer.pt                       # 12.3 GB, Optimizer state
│   ├── rng_state.pth                      # 14.6 kB, Random number generator state
│   ├── scheduler.pt                       # 1.47 kB, Scheduler state
│   ├── special_tokens_map.json            # 289 Bytes, Special tokens
│   ├── tokenizer.json                     # 17.2 MB, Tokenizer data
│   ├── tokenizer_config.json              # 50.4 kB, Tokenizer configuration
│   ├── trainer_state.json                 # 84.5 kB, Trainer state (uploaded ~4 hours ago)
│   └── training_args.bin                  # Training arguments
└── int4/                                  # Quantized model for CPU
    ├── README.md                          # 1.7 kB, Quantized model documentation (updated ~1 hour ago)
    ├── config.json                        # 2.67 vikB, Quantized model configuration
    ├── generation_config.json             # 177 Bytes, Generation settings
    ├── pytorch_model.bin                  # 2.63 GB, Quantized model weights
    ├── special_tokens_map.json            # 289 Bytes, Special tokens
    ├── tokenizer.json                     # 17.2 MB, Tokenizer data
    └── tokenizer_config.json              # Tokenizer configuration
Downloads last month
23
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tonic/petite-elle-L-aime-3-sft

Finetuned
(62)
this model
Quantizations
1 model

Dataset used to train Tonic/petite-elle-L-aime-3-sft

Spaces using Tonic/petite-elle-L-aime-3-sft 2

Evaluation results