PleIAs
/

Topical

text2text-generation

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

Pclanglais commited on Jul 17, 2024

Commit

74a9351

·

verified ·

1 Parent(s): 478d995

Update README.md

Files changed (1) hide show

README.md +9 -26

README.md CHANGED Viewed

@@ -1,34 +1,17 @@
 ---
 license: apache-2.0
 base_model: t5-small
-tags:
-- generated_from_trainer
-metrics:
-- rouge
-model-index:
-- name: t5-small-common-corpus-topic-simple-batch
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Pleias-Topic-Detection
-**Pleias-Topic-Detection** is an encoder-decoder specialized for topic detection. Given a document Pleias-Topic-Deduction will return a main topic that can be used for further downstream tasks (annotation, embedding indexation)
-Pleias-Topic-Detection is a finetuned version of t5-small on a set of 70,000 documents and associated topics from Common Corpus. While t5-small has been reportedly only trained in English, the model actually shows unexpected capacities for multilingual annotation. The final corpus include a significant amount of texts in French, Spanish, Italian, Dutch and German and has been proven to work somewhat in all of theses languages.
-Given that Pleias-Topic-Detection is a relatively lightweight model (70 million parameters) it can be used for classification at scale on a large corpus.
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 1
-- eval_batch_size: 1
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- num_epochs: 1
-- mixed_precision_training: Native AMP

 ---
 license: apache-2.0
 base_model: t5-small
+language:
+- en
+- fr
+- de
+- es
 ---
+**Topical** is a small language model specialized for topic extraction. Given a document Pleias-Topic-Deduction will return a main topic that can be used for further downstream tasks (annotation, embedding indexation)
+Like other model from PleIAs Bad Data Toolbox, Topical has been volontarily trained on 70,000 documents extracted from Common Corpus with a various range of digitization artifact.
+Topical is a lightweight model (70 million parameters) tha can be especially used for classification at scale on a large corpus.
+## Example