--- license: apache-2.0 datasets: - universal-dependencies/universal_dependencies language: - yo metrics: - accuracy - f1 - precision - recall --- | Feature | Description | | --- | --- | | **Name** | `yo_yordep` | | **Version** | `0.1.0` | | **spaCy** | `>=3.8.7,<3.9.0` | | **Default Pipeline** | `tok2vec`, `tagger`, `parser`, `morphologizer` | | **Components** | `tok2vec`, `tagger`, `parser`, `morphologizer` | | **Vectors** | 151125 keys, 151125 unique vectors (300 dimensions) | | **Sources** | `Dataset: https://github.com/UniversalDependencies/UD_Yoruba-YTB` `Embeddings: [FastText](https://fasttext.cc/docs/en/crawl-vectors.html)` | | **License** | `apache-2.0` | | **Author** | `Kolawole Lawal` | ### Label Scheme
View label scheme (154 labels for 3 components) | Component | Labels | | --- | --- | | **`tagger`** | `ADJ`, `ADJ__Case=Acc\|Number=Sing\|Person=1\|PronType=Prs`, `ADJ__Case=Nom\|Number=Sing\|Person=3\|PronType=Prs`, `ADJ__NumType=Ord`, `ADJ__Typo=Yes`, `ADP`, `ADP__Case=Acc\|Number=Sing\|Person=1\|PronType=Prs`, `ADP__NumType=Card`, `ADP__Typo=Yes`, `ADV`, `ADV__Typo=Yes`, `AUX`, `AUX__Case=Nom\|Number=Plur\|Person=1\|PronType=Prs`, `AUX__Case=Nom\|Number=Sing\|Person=1\|PronType=Prs`, `AUX__Case=Nom\|Number=Sing\|Person=3\|PronType=Prs`, `AUX__Typo=Yes`, `CCONJ`, `CCONJ__Case=Acc\|Number=Sing\|Person=1\|PronType=Prs`, `CCONJ__PronType=Ind`, `CCONJ__Typo=Yes`, `DET`, `DET__Number=Plur\|PronType=Dem`, `NOUN`, `NOUN__Case=Acc\|Number=Sing\|Person=1\|PronType=Prs`, `NOUN__Case=Nom\|Number=Sing\|Person=1\|PronType=Prs`, `NOUN__Typo=Yes`, `NUM__Case=Acc\|Number=Sing\|Person=1\|PronType=Prs`, `NUM__NumType=Card`, `PART`, `PART__Typo=Yes`, `PRON`, `PRON__Case=Acc\|Number=Plur\|Person=1\|PronType=Prs`, `PRON__Case=Acc\|Number=Plur\|Person=2\|PronType=Prs`, `PRON__Case=Acc\|Number=Plur\|Person=3\|PronType=Prs`, `PRON__Case=Acc\|Number=Sing\|Person=1\|PronType=Prs`, `PRON__Case=Acc\|Number=Sing\|Person=2\|PronType=Prs`, `PRON__Case=Acc\|Number=Sing\|Person=3\|PronType=Prs`, `PRON__Case=Gen\|Number=Plur\|Person=2\|PronType=Prs`, `PRON__Case=Gen\|Number=Plur\|Person=3\|PronType=Prs`, `PRON__Case=Gen\|Number=Sing\|Person=2\|PronType=Prs`, `PRON__Case=Gen\|Number=Sing\|Person=2\|PronType=Prs\|Typo=Yes`, `PRON__Case=Gen\|Number=Sing\|Person=3\|PronType=Prs`, `PRON__Case=Nom\|Number=Plur\|Person=1\|PronType=Prs`, `PRON__Case=Nom\|Number=Plur\|Person=2\|PronType=Prs`, `PRON__Case=Nom\|Number=Plur\|Person=3\|PronType=Prs`, `PRON__Case=Nom\|Number=Sing\|Person=1\|PronType=Prs`, `PRON__Case=Nom\|Number=Sing\|Person=2\|PronType=Prs`, `PRON__Case=Nom\|Number=Sing\|Person=3\|PronType=Prs`, `PRON__PronType=Dem`, `PRON__PronType=Emp`, `PRON__PronType=Ind`, `PRON__PronType=Int`, `PRON__PronType=Int\|Typo=Yes`, `PRON__PronType=Rel`, `PRON__PronType=Rel\|Typo=Yes`, `PROPN`, `PROPN__Case=Nom\|Number=Plur\|Person=1\|PronType=Prs`, `PROPN__Typo=Yes`, `PUNCT`, `SCONJ`, `SCONJ__Typo=Yes`, `SYM`, `VERB`, `VERB__Typo=Yes`, `X` | | **`parser`** | `ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `compound`, `compound:svc`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `mark`, `nmod`, `nsubj`, `obj`, `obl`, `parataxis`, `punct` | | **`morphologizer`** | `POS=ADP`, `POS=NOUN`, `POS=DET`, `POS=VERB`, `Number=Plur\|POS=DET\|PronType=Dem`, `POS=CCONJ`, `POS=PUNCT`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `POS=ADJ`, `POS=AUX`, `POS=SCONJ`, `Case=Acc\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `POS=ADV`, `NumType=Ord\|POS=ADJ`, `POS=PRON\|PronType=Rel`, `POS=PRON`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Number=Sing\|POS=AUX\|Person=3\|PronType=Prs`, `NumType=Card\|POS=NUM`, `Case=Gen\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `POS=PART`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `POS=PRON\|PronType=Emp`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `POS=PRON\|PronType=Ind`, `POS=NOUN\|Typo=Yes`, `Case=Acc\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `POS=PROPN`, `Case=Acc\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `POS=PRON\|PronType=Int`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `POS=X`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `POS=ADP\|Typo=Yes`, `POS=PRON\|PronType=Dem`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `POS=PROPN\|Typo=Yes`, `POS=AUX\|Typo=Yes`, `POS=ADJ\|Typo=Yes`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs\|Typo=Yes`, `POS=CCONJ\|Typo=Yes`, `POS=ADV\|Typo=Yes`, `POS=PRON\|PronType=Rel\|Typo=Yes`, `POS=SCONJ\|Typo=Yes`, `POS=VERB\|Typo=Yes`, `POS=CCONJ\|PronType=Ind`, `POS=PRON\|PronType=Int\|Typo=Yes`, `POS=PART\|Typo=Yes`, `Case=Nom\|Number=Plur\|POS=AUX\|Person=1\|PronType=Prs`, `Case=Nom\|Number=Sing\|POS=ADJ\|Person=3\|PronType=Prs`, `NumType=Card\|POS=ADP`, `Case=Nom\|Number=Sing\|POS=NOUN\|Person=1\|PronType=Prs`, `Case=Acc\|Number=Sing\|POS=ADP\|Person=1\|PronType=Prs`, `Case=Acc\|Number=Sing\|POS=NOUN\|Person=1\|PronType=Prs`, `Case=Nom\|Number=Plur\|POS=PROPN\|Person=1\|PronType=Prs`, `Case=Nom\|Number=Sing\|POS=AUX\|Person=1\|PronType=Prs`, `POS=SYM`, `Case=Acc\|Number=Sing\|POS=NUM\|Person=1\|PronType=Prs`, `Case=Acc\|Number=Sing\|POS=ADJ\|Person=1\|PronType=Prs`, `Case=Acc\|Number=Sing\|POS=CCONJ\|Person=1\|PronType=Prs` |
### METRICS These metrics were gotten using the `spacy evaluate` CLI | Type | Score | | --- | --- | | `TAG_ACC` | 88.51 | | `POS_ACC` | 89.84 | | `TAG_MICRO_P` | 0.00 | | `TAG_MICRO_R` | 0.00 | | `TAG_MICRO_F` | 0.00 | | `DEP_UAS` | 70.61 | | `DEP_LAS` | 59.17 | | `SENTS_P` | 82.86 | | `SENTS_R` | 91.58 | | `SENTS_F` | 87.00 | | `MORPH_ACC` | 96.46 | | `TOK2VEC_LOSS` | 94585.70 | | `TAGGER_LOSS` | 5570.00 | | `PARSER_LOSS` | 63924.84 | | `MORPHOLOGIZER_LOSS` | 5570.00 | ### FURTHER READINGS: * https://www.researchgate.net/publication/395833304_YORDEPAN_TOWARDS_YORUBA_DEPENDENCY_TREEBANK_CREATION ### NOTE: * This model was trained using the dataset referenced above which consists of 318 sentences, although incorporated with the FastText. * https://yordepan.streamlit.app/ * Future development will include lemmatizer. ### USAGE: * Download with `pip install https://huggingface.co/Kola9INE/yordep/resolve/main/yo_yordep-0.1.0.tar.gz` into your virtual environment. * `import spacy`. * Load model with `yor_nlp = spacy.load("yo_yordep")` ### CAVEAT!!! * This model was initialized in the following pipelines as displayed in the `Components` or `Default Pipelines`. It is not trained in the `lemmatizer` pipeline and therefore cannot lemmatize!