Add YAML in README
Browse files
README.md
CHANGED
|
@@ -1,3 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# TataKata: Indonesian BERT Language Model
|
| 2 |
|
| 3 |
**TataKata** is an Indonesian BERT model trained through continued pretraining of the original IndoBERT base architecture. The model is designed to enhance understanding of Indonesian grammar and word usage, aligning with KBBI (Kamus Besar Bahasa Indonesia) and PUEBI (Pedoman Umum Ejaan Bahasa Indonesia) standards.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- id
|
| 4 |
+
language_bcp47:
|
| 5 |
+
- ind
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
tags:
|
| 8 |
+
- indobert
|
| 9 |
+
- masked-lm
|
| 10 |
+
- nlp
|
| 11 |
+
- bahasa-indonesia
|
| 12 |
+
datasets:
|
| 13 |
+
- wikipedia
|
| 14 |
+
- kbbi
|
| 15 |
+
- news
|
| 16 |
+
metrics:
|
| 17 |
+
- perplexity
|
| 18 |
+
model-index:
|
| 19 |
+
- name: TataKata
|
| 20 |
+
results:
|
| 21 |
+
- task:
|
| 22 |
+
type: masked-language-modeling
|
| 23 |
+
name: Masked Language Modeling
|
| 24 |
+
dataset:
|
| 25 |
+
name: Indonesian Wikipedia + KBBI
|
| 26 |
+
type: text
|
| 27 |
+
metrics:
|
| 28 |
+
- name: Perplexity
|
| 29 |
+
type: perplexity
|
| 30 |
+
value: 12.4
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
# TataKata: Indonesian BERT Language Model
|
| 34 |
|
| 35 |
**TataKata** is an Indonesian BERT model trained through continued pretraining of the original IndoBERT base architecture. The model is designed to enhance understanding of Indonesian grammar and word usage, aligning with KBBI (Kamus Besar Bahasa Indonesia) and PUEBI (Pedoman Umum Ejaan Bahasa Indonesia) standards.
|