citylighxts commited on
Commit
4b90902
·
verified ·
1 Parent(s): 4324aa1

Add YAML in README

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -1,3 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # TataKata: Indonesian BERT Language Model
2
 
3
  **TataKata** is an Indonesian BERT model trained through continued pretraining of the original IndoBERT base architecture. The model is designed to enhance understanding of Indonesian grammar and word usage, aligning with KBBI (Kamus Besar Bahasa Indonesia) and PUEBI (Pedoman Umum Ejaan Bahasa Indonesia) standards.
 
1
+ ---
2
+ language:
3
+ - id
4
+ language_bcp47:
5
+ - ind
6
+ license: apache-2.0
7
+ tags:
8
+ - indobert
9
+ - masked-lm
10
+ - nlp
11
+ - bahasa-indonesia
12
+ datasets:
13
+ - wikipedia
14
+ - kbbi
15
+ - news
16
+ metrics:
17
+ - perplexity
18
+ model-index:
19
+ - name: TataKata
20
+ results:
21
+ - task:
22
+ type: masked-language-modeling
23
+ name: Masked Language Modeling
24
+ dataset:
25
+ name: Indonesian Wikipedia + KBBI
26
+ type: text
27
+ metrics:
28
+ - name: Perplexity
29
+ type: perplexity
30
+ value: 12.4
31
+ ---
32
+
33
  # TataKata: Indonesian BERT Language Model
34
 
35
  **TataKata** is an Indonesian BERT model trained through continued pretraining of the original IndoBERT base architecture. The model is designed to enhance understanding of Indonesian grammar and word usage, aligning with KBBI (Kamus Besar Bahasa Indonesia) and PUEBI (Pedoman Umum Ejaan Bahasa Indonesia) standards.