metadata
version: main
family: smollm2-1.7b
model_name: locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B
license: mit
tags:
- model
- transformer
- smollm2
- safety p
datasets:
- locuslab/refuseweb
- locuslab/safeweb
- locuslab/moral_education
- HuggingFaceTB/smollm-corpus
base_model:
- locuslab/safelm-1.7b_base_rephrase_refusal_moral_ed_600B
SafeLM-1.7B Instruct
SafeLM is a 1.7B parameter model family that is trained via Safety Pretraining. We train language models to be natively safe by incorporating safety directly into the pretraining pipeline. This is our instruction-tuned model. Our safety data curation involves scoring harmful content, rephrasing and contextualizing potentially harmful examples, and refusal training throughout pretraining. Please check out our paper and website for more details!
Model Details
- Architecture: SmolLM2
- Parameters: 1.7B
Training Configuration
optimizer:
class_path: torch.optim.AdamW
init_args:
lr: 0.0005
weight_decay: 0.01
precision: bf16-mixed
seed: 42
train:
global_batch_size: 1024
max_seq_length: 2048
max_tokens: 600000000000
micro_batch_size: 8
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B")
tokenizer = AutoTokenizer.from_pretrained("locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B")
Citation
If you find our work helpful, please cite our work as:
@article{maini2025safety,
title={Safety pretraining: Toward the next generation of safe ai},
author={Maini, Pratyush and Goyal, Sachin and Sam, Dylan and Robey, Alex and Savani, Yash and Jiang, Yiding and Zou, Andy and Lipton, Zachary C and Kolter, J Zico},
journal={arXiv preprint arXiv:2504.16980},
year={2025}
}