---
version: main
family: smollm2-1.7b
model_name: locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B
license: mit
tags:
- model
- transformer
- smollm2
- safety p
datasets:
- locuslab/refuseweb
- locuslab/safeweb
- locuslab/moral_education
- HuggingFaceTB/smollm-corpus
base_model:
- locuslab/safelm-1.7b_base_rephrase_refusal_moral_ed_600B
---
# SafeLM-1.7B Instruct

SafeLM is a 1.7B parameter model family that is trained via [Safety Pretraining](https://www.arxiv.org/abs/2504.16980). We train language models to be natively safe by incorporating safety
directly into the pretraining pipeline. This is our instruction-tuned model. Our safety data curation involves scoring harmful content, rephrasing and contextualizing potentially harmful examples, and refusal training throughout pretraining. 
Please check out our [paper](https://www.arxiv.org/abs/2504.16980) and [website](https://locuslab.github.io/safety-pretraining/) for more details!

## Model Details
- **Architecture:** SmolLM2
- **Parameters:** 1.7B

## Training Configuration
```yaml
optimizer:
  class_path: torch.optim.AdamW
  init_args:
    lr: 0.0005
    weight_decay: 0.01
precision: bf16-mixed
seed: 42
train:
  global_batch_size: 1024
  max_seq_length: 2048
  max_tokens: 600000000000
  micro_batch_size: 8

```

## Quickstart

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B")
tokenizer = AutoTokenizer.from_pretrained("locuslab/safelm-1.7b_instruct_rephrase_refusal_moral_ed_600B")
```

## Citation

If you find our work helpful, please cite our work as:

```
@article{maini2025safety,
  title={Safety pretraining: Toward the next generation of safe ai},
  author={Maini, Pratyush and Goyal, Sachin and Sam, Dylan and Robey, Alex and Savani, Yash and Jiang, Yiding and Zou, Andy and Lipton, Zachary C and Kolter, J Zico},
  journal={arXiv preprint arXiv:2504.16980},
  year={2025}
}
```