Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- indobert
|
| 5 |
+
- emotion-classification
|
| 6 |
+
- text-classification
|
| 7 |
+
- indonesian
|
| 8 |
+
- torch
|
| 9 |
+
language:
|
| 10 |
+
- id
|
| 11 |
+
datasets:
|
| 12 |
+
- PRDECT-ID
|
| 13 |
+
model-index:
|
| 14 |
+
- name: IndoBERT Emotion Classification (5-Class)
|
| 15 |
+
results:
|
| 16 |
+
- task:
|
| 17 |
+
type: text-classification
|
| 18 |
+
name: Emotion Classification
|
| 19 |
+
dataset:
|
| 20 |
+
name: PRDECT-ID
|
| 21 |
+
type: text
|
| 22 |
+
description: >
|
| 23 |
+
A dataset of Indonesian product reviews labeled with five emotion categories:
|
| 24 |
+
love, happiness, anger, fear, and sadness.
|
| 25 |
+
metrics:
|
| 26 |
+
- name: Accuracy
|
| 27 |
+
type: accuracy
|
| 28 |
+
value: 0.7167
|
| 29 |
+
- name: F1 Score
|
| 30 |
+
type: f1
|
| 31 |
+
value: 0.7125
|
| 32 |
+
- name: Precision
|
| 33 |
+
type: precision
|
| 34 |
+
value: 0.7179
|
| 35 |
+
- name: Recall
|
| 36 |
+
type: recall
|
| 37 |
+
value: 0.7167
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
# IndoBERT Emotion Classification (5-Class)
|
| 41 |
+
|
| 42 |
+
Model ini merupakan hasil *fine-tuning* dari [`indobenchmark/indobert-base-p1`](https://huggingface.co/indobenchmark/indobert-base-p1) untuk tugas klasifikasi emosi dalam Bahasa Indonesia, dengan 5 label emosi: `love`, `happiness`, `anger`, `fear`, dan `sadness`.
|
| 43 |
+
|
| 44 |
+
## 🧠 Dataset
|
| 45 |
+
|
| 46 |
+
Model ini dilatih menggunakan **PRDECT-ID Dataset**, yaitu kumpulan ulasan produk berbahasa Indonesia dari e-commerce Tokopedia, yang sudah dianotasi dengan label emosi oleh ahli psikologi klinis.
|
| 47 |
+
|
| 48 |
+
- 29 kategori produk
|
| 49 |
+
- Anotasi emosi oleh tim profesional
|
| 50 |
+
- Setiap entri memiliki 1 label emosi
|
| 51 |
+
|
| 52 |
+
## 🛠 Fine-tuning Details
|
| 53 |
+
|
| 54 |
+
- **Base model**: `indobenchmark/indobert-base-p1`
|
| 55 |
+
- **Training epochs**: 5 dari total 10 (early stopping dengan `load_best_model_at_end=True`)
|
| 56 |
+
- **Batch size**: 8
|
| 57 |
+
- **Learning rate**: 2e-5
|
| 58 |
+
- **Weight decay**: 0.05
|
| 59 |
+
- **Validation strategy**: per epoch
|
| 60 |
+
- **Evaluation metric**: `eval_accuracy` (dengan `greater_is_better=True`)
|
| 61 |
+
- **Cross-validation**: Stratified K-Fold (n_splits=5)
|
| 62 |
+
|
| 63 |
+
### Eval Results (Best Model @ Epoch 3)
|
| 64 |
+
|
| 65 |
+
| Metric | Value |
|
| 66 |
+
|-------------|---------|
|
| 67 |
+
| Accuracy | 0.7167 |
|
| 68 |
+
| F1 Score | 0.7125 |
|
| 69 |
+
| Precision | 0.7179 |
|
| 70 |
+
| Recall | 0.7167 |
|
| 71 |
+
| Eval Loss | 0.7614 |
|
| 72 |
+
|
| 73 |
+
## 🚀 How to Use
|
| 74 |
+
|
| 75 |
+
```python
|
| 76 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
|
| 77 |
+
|
| 78 |
+
model = AutoModelForSequenceClassification.from_pretrained("galennolan/indobert-b-p1-indoemotion-5class")
|
| 79 |
+
tokenizer = AutoTokenizer.from_pretrained("galennolan/indobert-b-p1-indoemotion-5class")
|
| 80 |
+
|
| 81 |
+
emotion_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
|
| 82 |
+
|
| 83 |
+
emotion_classifier("Produk ini bikin aku senang banget!")
|