mbert: Emotion Recognition for Vietnamese Text
This model is a fine-tuned version of bert-base-multilingual-cased on the VSMEC dataset for emotion recognition in Vietnamese text.
Model Details
- Base Model: bert-base-multilingual-cased
- Description: Multilingual BERT
- Dataset: VSMEC (Vietnamese Social Media Emotion Corpus)
- Fine-tuning Framework: HuggingFace Transformers
- Task: Emotion Classification (7 classes)
Hyperparameters
- Batch size:
32 - Learning rate:
2e-5 - Epochs:
100 - Max sequence length:
256 - Weight decay:
0.01 - Warmup steps:
500
Dataset
The model was trained on the VSMEC dataset, which contains 6,927 Vietnamese social media text samples annotated with emotion labels. The dataset includes the following emotion categories:
- Enjoyment (0): Positive emotions, joy, happiness
- Sadness (1): Sad, disappointed, gloomy feelings
- Anger (2): Angry, frustrated, irritated
- Fear (3): Scared, anxious, worried
- Disgust (4): Disgusted, repelled
- Surprise (5): Surprised, shocked, amazed
- Other (6): Neutral or unclassified emotions
Results
The model was evaluated using the following metrics:
- Accuracy:
0.5455 - Macro-F1:
0.5064 - Macro-Precision:
0.6097 - Macro-Recall:
0.4803
Usage
You can use this model for emotion recognition in Vietnamese text. Below is an example of how to use it with the HuggingFace Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(f"visolex/{model_key}")
model = AutoModelForSequenceClassification.from_pretrained(f"visolex/{model_key}")
# Example text
text = "Tôi rất vui vì hôm nay trời đẹp!"
# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
# Predict
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=-1).item()
# Map to emotion name
emotion_map = {{
0: "Enjoyment",
1: "Sadness",
2: "Anger",
3: "Fear",
4: "Disgust",
5: "Surprise",
6: "Other"
}}
predicted_emotion = emotion_map[predicted_class]
print(f"Text: {{text}}")
print(f"Predicted emotion: {{predicted_emotion}}")
Citation
If you use this model, please cite:
@misc{{visolex_emotion_{model_key},
title={{ {description} for Vietnamese Emotion Recognition}},
author={{ViSoLex Team}},
year={{2024}},
url={{https://huggingface.co/visolex/{model_key}}}
}}
License
This model is released under the Apache-2.0 license.
Acknowledgments
- Base model: {base_model}
- Dataset: VSMEC (Vietnamese Social Media Emotion Corpus)
- ViSoLex Toolkit
Model tree for visolex/emotion-mbert
Base model
google-bert/bert-base-multilingual-casedEvaluation results
- accuracy on VSMECself-reported0.545
- macro-f1 on VSMECself-reported0.506