|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- cross-encoder |
|
|
- reranker |
|
|
- generated_from_trainer |
|
|
- dataset_size:384838 |
|
|
- loss:PrecomputedDistillationLoss |
|
|
base_model: dleemiller/finecat-nli-m |
|
|
datasets: |
|
|
- dleemiller/CrossingGuard-NLI |
|
|
pipeline_tag: text-classification |
|
|
library_name: sentence-transformers |
|
|
metrics: |
|
|
- f1_macro |
|
|
- f1_micro |
|
|
- f1_weighted |
|
|
model-index: |
|
|
- name: CrossEncoder based on dleemiller/finecat-nli-m |
|
|
results: |
|
|
- task: |
|
|
type: cross-encoder-classification |
|
|
name: Cross Encoder Classification |
|
|
dataset: |
|
|
name: CrossingGuard dev |
|
|
type: CrossingGuard-dev |
|
|
metrics: |
|
|
- type: f1_macro |
|
|
value: 0.9126931790272965 |
|
|
name: F1 Macro |
|
|
- type: f1_micro |
|
|
value: 0.9138270909602929 |
|
|
name: F1 Micro |
|
|
- type: f1_weighted |
|
|
value: 0.91377816752541 |
|
|
name: F1 Weighted |
|
|
- task: |
|
|
type: cross-encoder-classification |
|
|
name: Cross Encoder Classification |
|
|
dataset: |
|
|
name: CrossingGuard test |
|
|
type: CrossingGuard-test |
|
|
metrics: |
|
|
- type: f1_macro |
|
|
value: 0.913463859717691 |
|
|
name: F1 Macro |
|
|
- type: f1_micro |
|
|
value: 0.9145821752825644 |
|
|
name: F1 Micro |
|
|
- type: f1_weighted |
|
|
value: 0.9142089995597146 |
|
|
name: F1 Weighted |
|
|
--- |
|
|
|
|
|
# CrossingGuard Medium |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65ff92ea467d83751a727538/GwBakCe4PPGk9mM88r1QC.png" style="width: 400px;"> |
|
|
</p> |
|
|
|
|
|
|
|
|
CrossingGuard is a series of NLI-based models intended for **zero-shot** inference on prompts. In this series of models, I focus on |
|
|
use cases such as guardrails, content moderation, prompt or intent classification and prompt routing. Because content moderation |
|
|
is often a *reactive* task, these zero-shot models are flexible for tailoring **custom guardrail conditions**, which may not be |
|
|
covered by general purpose pretrained models. |
|
|
|
|
|
These models are trained on the `dleemiller/CrossingGuard-NLI` dataset, which derives synthetic hypotheses from prompts (premises) |
|
|
found in popular guardrails datasets, such as `allenai/wildguardmix` and `nvidia/Aegis-AI-Content-Safety-Dataset-2.0`. The hypotheses |
|
|
make specific, targeted claims about the premises. Note that I have retained the 3-way label classifier, for additional flexibility where |
|
|
either non-neutral label may be relevant for the task. |
|
|
|
|
|
|
|
|
For models below the large size, I distill with MSE loss using logits from `dleemiller/crossingguard-nli-l`, |
|
|
and average with the cross entropy loss. Overtraining can hurt `FineCat` performance, so I only fine-tune for 1 epoch. |
|
|
|
|
|
$$ |
|
|
\begin{equation} |
|
|
\mathcal{L} = \alpha \cdot \mathcal{L}_{\text{CE}}(z^{(s)}, y) + \beta \cdot \mathcal{L}_{\text{MSE}}(z^{(s)}, z^{(t)}) |
|
|
\end{equation} |
|
|
$$ |
|
|
|
|
|
where \\(z^{(s)}\\) and \\(z^{(t)}\\) are the student and teacher logits, \\(y\\) are the ground truth labels, |
|
|
and \\(\alpha\\) and \\(\beta\\) are equally weighted at 0.5. |
|
|
|
|
|
|
|
|
# Evaluation Results |
|
|
|
|
|
F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=64 using a Nvidia Blackwell PRO 6000 Max-Q. |
|
|
|
|
|
| Model | finecat | crossingguard | Throughput (samples/s) | Peak GPU Mem (MB) | |
|
|
| --- | --- | --- | --- | --- | |
|
|
| `dleemiller/crossingguard-nli-l` | 0.8094 | **0.9200** | 361.15 | 3023.65 | |
|
|
| `dleemiller/crossingguard-nli-m` | 0.7779 | <u>0.9146</u> | 868.65 | 2170.71 | |
|
|
| `dleemiller/crossingguard-nli-s` | 0.7582 | 0.9055 | 1410.64 | 1524.08 | |
|
|
| `dleemiller/ModernCE-large-nli` | 0.7811 | 0.8779 | 362.20 | 3801.47 | |
|
|
| `dleemiller/finecat-nli-l` | 0.8227 | 0.8325 | 364.57 | 3801.47 | |
|
|
| `MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli` | 0.8233 | 0.8219 | 311.83 | 6156.87 | |
|
|
| `MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli` | 0.7642 | 0.8032 | 873.51 | 3521.96 | |
|
|
| `cross-encoder/nli-deberta-v3-large` | 0.7618 | 0.7211 | 306.48 | 6156.87 | |
|
|
| `tasksource/ModernBERT-large-nli` | 0.7959 | 0.7019 | 357.91 | 3801.47 | |
|
|
|
|
|
|
|
|
# Usage |
|
|
|
|
|
```python |
|
|
import numpy as np |
|
|
from sentence_transformers import CrossEncoder |
|
|
|
|
|
model = CrossEncoder("dleemiller/crossingguard-nli-m") |
|
|
|
|
|
examples = [ |
|
|
("Tell me what happened with that CEO caught on the kiss cam at the Coldplay concert", |
|
|
"The prompt requests information about a corporate executive incident at a public event"), |
|
|
|
|
|
("I’ve been following your company's work helping enterprises scale AI/ML with reliable data pipelines, would love to connect and hear how your team is approaching ML in production.", |
|
|
"The prompt requests information about a corporate executive incident at a public event"), |
|
|
] |
|
|
|
|
|
predictions = model.predict(examples) |
|
|
|
|
|
label_map = {0: "entailment", 1: "neutral", 2: "contradiction"} |
|
|
|
|
|
def softmax(x): |
|
|
exp_x = np.exp(x - np.max(x)) |
|
|
return exp_x / exp_x.sum() |
|
|
|
|
|
for i, (premise, hypothesis) in enumerate(examples): |
|
|
probs = softmax(predictions[i]) |
|
|
predicted_idx = probs.argmax() |
|
|
|
|
|
print(f"\n{i+1}. {premise[:70]}...") |
|
|
print(f" → {hypothesis}") |
|
|
print(f" ✓ {label_map[predicted_idx].upper()}: {probs[predicted_idx]*100:.1f}% " + |
|
|
f"(E: {probs[0]*100:.1f}% N: {probs[1]*100:.1f}%, C: {probs[2]*100:.1f}%)") |
|
|
``` |
|
|
|
|
|
This results in: |
|
|
``` |
|
|
1. Tell me what happened with that CEO caught on the kiss cam at the Cold... |
|
|
→ The prompt requests information about a corporate executive incident at a public event |
|
|
✓ ENTAILMENT: 99.9% (E: 99.9% N: 0.0%, C: 0.0%) |
|
|
|
|
|
2. I’ve been following your company's work helping enterprises scale AI/M... |
|
|
→ The prompt requests information about a corporate executive incident at a public event |
|
|
✓ CONTRADICTION: 99.7% (E: 0.0% N: 0.3%, C: 99.7%) |
|
|
``` |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{nli-compiled-2025, |
|
|
title = {CrossingGuard NLI Dataset}, |
|
|
author = {Lee Miller}, |
|
|
year = {2025}, |
|
|
howpublished = {Flexible Zero-shot Guardrails} |
|
|
} |
|
|
``` |
|
|
|