SetFit with JohanHeinsen/Old_News_Segmentation_SBERT_V0.1
This is a SetFit model that can be used for Text Classification. This SetFit model uses JohanHeinsen/Old_News_Segmentation_SBERT_V0.1 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification. It is designed to identify texts describing missing people from police gazettes in nineteenth century Denmark.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
Model Sources
Model Labels
| Label |
Examples |
| 1 |
- '2) Fattiglem, fhv. Roersbetjent, Jens Hansen, af Fare, har den 15de ds. forladt Fattiggaarden. der og formodes at drive arbejdsløs omkring muligvis er han taget til Kjøbenhavn for at søge Hyre som Sømand. Saafremt han, der er 50 Aar gl., middel af Væxt, og var iført sort rundpullet Hat, blaa Trøje og Benklæder muligvis dog et Par Lærreds ovenpaa – samt Træsko, maatte antræffes, bedes Underretning derom meddelt Byog Herredsfogden i Storehedinge.'
- '1) En Mandsperson, ca. 20 Aar gl., middelstor eller lidt mindre, lyst Polkahaar, ordentlig klædt med mørk Sækfrakke og lyse Buxer, – sigtes for Tyveriet Nr. 3. (St. 2, 291.)'
- '2) Oplysning om, hvor Garversvend Niels Peter Schmidt eller Niels Peter Nielsen Schmidt, født 16de Febr. 1839, maatte opholde sig, bedes meddeelt Muckadell m. fl. Birkers Kontor i Spanget pr. Kværndrup. Paagjældende blev blev den 28de Januar d. A. viseret derfra til Odense, hvorfra han strax igien skal være afgaaet til Fredericia.'
|
| 0 |
- '2) 2 Høns og en Hane, denne sidste graa med laadne Ben, den ene Høne brunspættet, den anden sort med laadne Ben, ere bortkomne siden den 24. f.M. (St. 7, 448).'
- 'Hans Edvard Valdemar Holst (Kbhvn.), 45 Aar. Løsgængeri.'
- 'Peter Christian Leyring (Levring), 26 Aar. Betleri.'
|
Evaluation
Metrics
| Label |
Accuracy |
F1 |
Precision |
Recall |
| all |
0.9817 |
0.9385 |
0.9231 |
0.9545 |
Training Details
Training Set Metrics
| Training set |
Min |
Median |
Max |
| Word count |
3 |
20.8907 |
245 |
| Label |
Training Sample Count |
| 0 |
1195 |
| 1 |
205 |
Training Hyperparameters
- batch_size: (16, 16)
- num_epochs: (3, 3)
- max_steps: -1
- sampling_strategy: oversampling
- num_iterations: 12
- body_learning_rate: (2e-05, 2e-05)
- head_learning_rate: 2e-05
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- l2_weight: 0.01
- seed: 87
- eval_max_steps: -1
- load_best_model_at_end: False
Training Results
| Epoch |
Step |
Training Loss |
Validation Loss |
| 0.0005 |
1 |
0.1797 |
- |
| 0.0238 |
50 |
0.2091 |
- |
| 0.0476 |
100 |
0.1061 |
- |
| 0.0714 |
150 |
0.0529 |
- |
| 0.0952 |
200 |
0.0491 |
- |
| 0.1190 |
250 |
0.0238 |
- |
| 0.1429 |
300 |
0.0195 |
- |
| 0.1667 |
350 |
0.013 |
- |
| 0.1905 |
400 |
0.0066 |
- |
| 0.2143 |
450 |
0.005 |
- |
| 0.2381 |
500 |
0.0038 |
- |
| 0.2619 |
550 |
0.0038 |
- |
| 0.2857 |
600 |
0.005 |
- |
| 0.3095 |
650 |
0.0062 |
- |
| 0.3333 |
700 |
0.0024 |
- |
| 0.3571 |
750 |
0.0002 |
- |
| 0.3810 |
800 |
0.0003 |
- |
| 0.4048 |
850 |
0.0008 |
- |
| 0.4286 |
900 |
0.0001 |
- |
| 0.4524 |
950 |
0.0006 |
- |
| 0.4762 |
1000 |
0.0022 |
- |
| 0.5 |
1050 |
0.0003 |
- |
| 0.5238 |
1100 |
0.0016 |
- |
| 0.5476 |
1150 |
0.0001 |
- |
| 0.5714 |
1200 |
0.0 |
- |
| 0.5952 |
1250 |
0.0 |
- |
| 0.6190 |
1300 |
0.0 |
- |
| 0.6429 |
1350 |
0.0 |
- |
| 0.6667 |
1400 |
0.0 |
- |
| 0.6905 |
1450 |
0.0 |
- |
| 0.7143 |
1500 |
0.0 |
- |
| 0.7381 |
1550 |
0.0024 |
- |
| 0.7619 |
1600 |
0.0002 |
- |
| 0.7857 |
1650 |
0.0001 |
- |
| 0.8095 |
1700 |
0.0 |
- |
| 0.8333 |
1750 |
0.0 |
- |
| 0.8571 |
1800 |
0.0 |
- |
| 0.8810 |
1850 |
0.0 |
- |
| 0.9048 |
1900 |
0.0 |
- |
| 0.9286 |
1950 |
0.0 |
- |
| 0.9524 |
2000 |
0.0 |
- |
| 0.9762 |
2050 |
0.0 |
- |
| 1.0 |
2100 |
0.0 |
- |
| 1.0238 |
2150 |
0.0 |
- |
| 1.0476 |
2200 |
0.0 |
- |
| 1.0714 |
2250 |
0.0 |
- |
| 1.0952 |
2300 |
0.0 |
- |
| 1.1190 |
2350 |
0.0 |
- |
| 1.1429 |
2400 |
0.0 |
- |
| 1.1667 |
2450 |
0.0 |
- |
| 1.1905 |
2500 |
0.0 |
- |
| 1.2143 |
2550 |
0.0 |
- |
| 1.2381 |
2600 |
0.0 |
- |
| 1.2619 |
2650 |
0.0 |
- |
| 1.2857 |
2700 |
0.0 |
- |
| 1.3095 |
2750 |
0.0 |
- |
| 1.3333 |
2800 |
0.0 |
- |
| 1.3571 |
2850 |
0.0 |
- |
| 1.3810 |
2900 |
0.0 |
- |
| 1.4048 |
2950 |
0.0 |
- |
| 1.4286 |
3000 |
0.0 |
- |
| 1.4524 |
3050 |
0.0 |
- |
| 1.4762 |
3100 |
0.0 |
- |
| 1.5 |
3150 |
0.0 |
- |
| 1.5238 |
3200 |
0.0 |
- |
| 1.5476 |
3250 |
0.0 |
- |
| 1.5714 |
3300 |
0.0 |
- |
| 1.5952 |
3350 |
0.0 |
- |
| 1.6190 |
3400 |
0.0 |
- |
| 1.6429 |
3450 |
0.0 |
- |
| 1.6667 |
3500 |
0.0 |
- |
| 1.6905 |
3550 |
0.0 |
- |
| 1.7143 |
3600 |
0.0 |
- |
| 1.7381 |
3650 |
0.0 |
- |
| 1.7619 |
3700 |
0.0 |
- |
| 1.7857 |
3750 |
0.0 |
- |
| 1.8095 |
3800 |
0.0 |
- |
| 1.8333 |
3850 |
0.0 |
- |
| 1.8571 |
3900 |
0.0 |
- |
| 1.8810 |
3950 |
0.0 |
- |
| 1.9048 |
4000 |
0.0 |
- |
| 1.9286 |
4050 |
0.0 |
- |
| 1.9524 |
4100 |
0.0 |
- |
| 1.9762 |
4150 |
0.0 |
- |
| 2.0 |
4200 |
0.0 |
- |
| 2.0238 |
4250 |
0.0 |
- |
| 2.0476 |
4300 |
0.0 |
- |
| 2.0714 |
4350 |
0.0 |
- |
| 2.0952 |
4400 |
0.0 |
- |
| 2.1190 |
4450 |
0.0 |
- |
| 2.1429 |
4500 |
0.0 |
- |
| 2.1667 |
4550 |
0.0 |
- |
| 2.1905 |
4600 |
0.0 |
- |
| 2.2143 |
4650 |
0.0 |
- |
| 2.2381 |
4700 |
0.0 |
- |
| 2.2619 |
4750 |
0.0 |
- |
| 2.2857 |
4800 |
0.0 |
- |
| 2.3095 |
4850 |
0.0 |
- |
| 2.3333 |
4900 |
0.0 |
- |
| 2.3571 |
4950 |
0.0 |
- |
| 2.3810 |
5000 |
0.0 |
- |
| 2.4048 |
5050 |
0.0 |
- |
| 2.4286 |
5100 |
0.0 |
- |
| 2.4524 |
5150 |
0.0 |
- |
| 2.4762 |
5200 |
0.0 |
- |
| 2.5 |
5250 |
0.0 |
- |
| 2.5238 |
5300 |
0.0 |
- |
| 2.5476 |
5350 |
0.0 |
- |
| 2.5714 |
5400 |
0.0 |
- |
| 2.5952 |
5450 |
0.0 |
- |
| 2.6190 |
5500 |
0.0 |
- |
| 2.6429 |
5550 |
0.0 |
- |
| 2.6667 |
5600 |
0.0 |
- |
| 2.6905 |
5650 |
0.0 |
- |
| 2.7143 |
5700 |
0.0 |
- |
| 2.7381 |
5750 |
0.0 |
- |
| 2.7619 |
5800 |
0.0 |
- |
| 2.7857 |
5850 |
0.0 |
- |
| 2.8095 |
5900 |
0.0 |
- |
| 2.8333 |
5950 |
0.0 |
- |
| 2.8571 |
6000 |
0.0 |
- |
| 2.8810 |
6050 |
0.0 |
- |
| 2.9048 |
6100 |
0.0 |
- |
| 2.9286 |
6150 |
0.0 |
- |
| 2.9524 |
6200 |
0.0 |
- |
| 2.9762 |
6250 |
0.0 |
- |
| 3.0 |
6300 |
0.0 |
- |
Framework Versions
- Python: 3.11.12
- SetFit: 1.1.3
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.7.0
- Datasets: 2.19.2
- Tokenizers: 0.21.1