Yoruba
yordep / README.md
Kola9INE's picture
Update README.md
1d31a48 verified
metadata
license: apache-2.0
datasets:
  - universal-dependencies/universal_dependencies
language:
  - yo
metrics:
  - accuracy
  - f1
  - precision
  - recall
Feature Description
Name yo_yordep
Version 0.1.0
spaCy >=3.8.7,<3.9.0
Default Pipeline tok2vec, tagger, parser, morphologizer
Components tok2vec, tagger, parser, morphologizer
Vectors 151125 keys, 151125 unique vectors (300 dimensions)
Sources Dataset: https://github.com/UniversalDependencies/UD_Yoruba-YTB Embeddings: [FastText](https://fasttext.cc/docs/en/crawl-vectors.html)
License apache-2.0
Author Kolawole Lawal

Label Scheme

View label scheme (154 labels for 3 components)
Component Labels
tagger ADJ, ADJ__Case=Acc|Number=Sing|Person=1|PronType=Prs, ADJ__Case=Nom|Number=Sing|Person=3|PronType=Prs, ADJ__NumType=Ord, ADJ__Typo=Yes, ADP, ADP__Case=Acc|Number=Sing|Person=1|PronType=Prs, ADP__NumType=Card, ADP__Typo=Yes, ADV, ADV__Typo=Yes, AUX, AUX__Case=Nom|Number=Plur|Person=1|PronType=Prs, AUX__Case=Nom|Number=Sing|Person=1|PronType=Prs, AUX__Case=Nom|Number=Sing|Person=3|PronType=Prs, AUX__Typo=Yes, CCONJ, CCONJ__Case=Acc|Number=Sing|Person=1|PronType=Prs, CCONJ__PronType=Ind, CCONJ__Typo=Yes, DET, DET__Number=Plur|PronType=Dem, NOUN, NOUN__Case=Acc|Number=Sing|Person=1|PronType=Prs, NOUN__Case=Nom|Number=Sing|Person=1|PronType=Prs, NOUN__Typo=Yes, NUM__Case=Acc|Number=Sing|Person=1|PronType=Prs, NUM__NumType=Card, PART, PART__Typo=Yes, PRON, PRON__Case=Acc|Number=Plur|Person=1|PronType=Prs, PRON__Case=Acc|Number=Plur|Person=2|PronType=Prs, PRON__Case=Acc|Number=Plur|Person=3|PronType=Prs, PRON__Case=Acc|Number=Sing|Person=1|PronType=Prs, PRON__Case=Acc|Number=Sing|Person=2|PronType=Prs, PRON__Case=Acc|Number=Sing|Person=3|PronType=Prs, PRON__Case=Gen|Number=Plur|Person=2|PronType=Prs, PRON__Case=Gen|Number=Plur|Person=3|PronType=Prs, PRON__Case=Gen|Number=Sing|Person=2|PronType=Prs, PRON__Case=Gen|Number=Sing|Person=2|PronType=Prs|Typo=Yes, PRON__Case=Gen|Number=Sing|Person=3|PronType=Prs, PRON__Case=Nom|Number=Plur|Person=1|PronType=Prs, PRON__Case=Nom|Number=Plur|Person=2|PronType=Prs, PRON__Case=Nom|Number=Plur|Person=3|PronType=Prs, PRON__Case=Nom|Number=Sing|Person=1|PronType=Prs, PRON__Case=Nom|Number=Sing|Person=2|PronType=Prs, PRON__Case=Nom|Number=Sing|Person=3|PronType=Prs, PRON__PronType=Dem, PRON__PronType=Emp, PRON__PronType=Ind, PRON__PronType=Int, PRON__PronType=Int|Typo=Yes, PRON__PronType=Rel, PRON__PronType=Rel|Typo=Yes, PROPN, PROPN__Case=Nom|Number=Plur|Person=1|PronType=Prs, PROPN__Typo=Yes, PUNCT, SCONJ, SCONJ__Typo=Yes, SYM, VERB, VERB__Typo=Yes, X
parser ROOT, acl, advcl, advmod, amod, aux, case, cc, ccomp, compound, compound:svc, conj, cop, dep, det, expl, fixed, mark, nmod, nsubj, obj, obl, parataxis, punct
morphologizer POS=ADP, POS=NOUN, POS=DET, POS=VERB, Number=Plur|POS=DET|PronType=Dem, POS=CCONJ, POS=PUNCT, Case=Nom|Number=Sing|POS=PRON|Person=3|PronType=Prs, POS=ADJ, POS=AUX, POS=SCONJ, Case=Acc|Number=Sing|POS=PRON|Person=3|PronType=Prs, POS=ADV, NumType=Ord|POS=ADJ, POS=PRON|PronType=Rel, POS=PRON, Case=Gen|Number=Sing|POS=PRON|Person=3|PronType=Prs, Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs, Case=Nom|Number=Sing|POS=AUX|Person=3|PronType=Prs, NumType=Card|POS=NUM, Case=Gen|Number=Plur|POS=PRON|Person=3|PronType=Prs, Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs, POS=PART, Case=Nom|Number=Plur|POS=PRON|Person=2|PronType=Prs, Case=Nom|Number=Plur|POS=PRON|Person=1|PronType=Prs, POS=PRON|PronType=Emp, Case=Acc|Number=Plur|POS=PRON|Person=1|PronType=Prs, Case=Nom|Number=Sing|POS=PRON|Person=1|PronType=Prs, Case=Gen|Number=Plur|POS=PRON|Person=2|PronType=Prs, POS=PRON|PronType=Ind, POS=NOUN|Typo=Yes, Case=Acc|Number=Sing|POS=PRON|Person=2|PronType=Prs, POS=PROPN, Case=Acc|Number=Sing|POS=PRON|Person=1|PronType=Prs, POS=PRON|PronType=Int, Case=Gen|Number=Sing|POS=PRON|Person=2|PronType=Prs, POS=X, Case=Nom|Number=Sing|POS=PRON|Person=2|PronType=Prs, POS=ADP|Typo=Yes, POS=PRON|PronType=Dem, Case=Acc|Number=Plur|POS=PRON|Person=2|PronType=Prs, POS=PROPN|Typo=Yes, POS=AUX|Typo=Yes, POS=ADJ|Typo=Yes, Case=Gen|Number=Sing|POS=PRON|Person=2|PronType=Prs|Typo=Yes, POS=CCONJ|Typo=Yes, POS=ADV|Typo=Yes, POS=PRON|PronType=Rel|Typo=Yes, POS=SCONJ|Typo=Yes, POS=VERB|Typo=Yes, POS=CCONJ|PronType=Ind, POS=PRON|PronType=Int|Typo=Yes, POS=PART|Typo=Yes, Case=Nom|Number=Plur|POS=AUX|Person=1|PronType=Prs, Case=Nom|Number=Sing|POS=ADJ|Person=3|PronType=Prs, NumType=Card|POS=ADP, Case=Nom|Number=Sing|POS=NOUN|Person=1|PronType=Prs, Case=Acc|Number=Sing|POS=ADP|Person=1|PronType=Prs, Case=Acc|Number=Sing|POS=NOUN|Person=1|PronType=Prs, Case=Nom|Number=Plur|POS=PROPN|Person=1|PronType=Prs, Case=Nom|Number=Sing|POS=AUX|Person=1|PronType=Prs, POS=SYM, Case=Acc|Number=Sing|POS=NUM|Person=1|PronType=Prs, Case=Acc|Number=Sing|POS=ADJ|Person=1|PronType=Prs, Case=Acc|Number=Sing|POS=CCONJ|Person=1|PronType=Prs

METRICS

These metrics were gotten using the spacy evaluate CLI

Type Score
TAG_ACC 88.51
POS_ACC 89.84
TAG_MICRO_P 0.00
TAG_MICRO_R 0.00
TAG_MICRO_F 0.00
DEP_UAS 70.61
DEP_LAS 59.17
SENTS_P 82.86
SENTS_R 91.58
SENTS_F 87.00
MORPH_ACC 96.46
TOK2VEC_LOSS 94585.70
TAGGER_LOSS 5570.00
PARSER_LOSS 63924.84
MORPHOLOGIZER_LOSS 5570.00

FURTHER READINGS:

NOTE:

  • This model was trained using the dataset referenced above which consists of 318 sentences, although incorporated with the FastText.
  • https://yordepan.streamlit.app/
  • Future development will include lemmatizer.

USAGE:

  • Download with pip install https://huggingface.co/Kola9INE/yordep/resolve/main/yo_yordep-0.1.0.tar.gz into your virtual environment.
  • import spacy.
  • Load model with yor_nlp = spacy.load("yo_yordep")

CAVEAT!!!

  • This model was initialized in the following pipelines as displayed in the Components or Default Pipelines. It is not trained in the lemmatizer pipeline and therefore cannot lemmatize!