--- version: main family: smollm2-1.7b model_name: locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B license: mit tags: - model - transformer - smollm2 - safety p datasets: - locuslab/refuseweb - locuslab/safeweb - locuslab/moral_education - HuggingFaceTB/smollm-corpus base_model: - locuslab/safelm-1.7b_base_rephrase_refusal_moral_ed_600B --- # SafeLM-1.7B Instruct SafeLM is a 1.7B parameter model family that is trained via [Safety Pretraining](https://www.arxiv.org/abs/2504.16980). We train language models to be natively safe by incorporating safety directly into the pretraining pipeline. This is our instruction-tuned model. Our safety data curation involves scoring harmful content, rephrasing and contextualizing potentially harmful examples, and refusal training throughout pretraining. Please check out our [paper](https://www.arxiv.org/abs/2504.16980) and [website](https://locuslab.github.io/safety-pretraining/) for more details! ## Model Details - **Architecture:** SmolLM2 - **Parameters:** 1.7B ## Training Configuration ```yaml optimizer: class_path: torch.optim.AdamW init_args: lr: 0.0005 weight_decay: 0.01 precision: bf16-mixed seed: 42 train: global_batch_size: 1024 max_seq_length: 2048 max_tokens: 600000000000 micro_batch_size: 8 ``` ## Quickstart ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B") tokenizer = AutoTokenizer.from_pretrained("locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B") ``` ## Citation If you find our work helpful, please cite our work as: ``` @article{maini2025safety, title={Safety pretraining: Toward the next generation of safe ai}, author={Maini, Pratyush and Goyal, Sachin and Sam, Dylan and Robey, Alex and Savani, Yash and Jiang, Yiding and Zou, Andy and Lipton, Zachary C and Kolter, J Zico}, journal={arXiv preprint arXiv:2504.16980}, year={2025} } ```