mbert: Emotion Recognition for Vietnamese Text

This model is a fine-tuned version of bert-base-multilingual-cased on the VSMEC dataset for emotion recognition in Vietnamese text.

Model Details

Base Model: bert-base-multilingual-cased
Description: Multilingual BERT
Dataset: VSMEC (Vietnamese Social Media Emotion Corpus)
Fine-tuning Framework: HuggingFace Transformers
Task: Emotion Classification (7 classes)

Hyperparameters

Batch size: 32
Learning rate: 2e-5
Epochs: 100
Max sequence length: 256
Weight decay: 0.01
Warmup steps: 500

Dataset

The model was trained on the VSMEC dataset, which contains 6,927 Vietnamese social media text samples annotated with emotion labels. The dataset includes the following emotion categories:

Enjoyment (0): Positive emotions, joy, happiness
Sadness (1): Sad, disappointed, gloomy feelings
Anger (2): Angry, frustrated, irritated
Fear (3): Scared, anxious, worried
Disgust (4): Disgusted, repelled
Surprise (5): Surprised, shocked, amazed
Other (6): Neutral or unclassified emotions

Results

The model was evaluated using the following metrics:

Accuracy: 0.5455
Macro-F1: 0.5064
Macro-Precision: 0.6097
Macro-Recall: 0.4803

Usage

You can use this model for emotion recognition in Vietnamese text. Below is an example of how to use it with the HuggingFace Transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(f"visolex/{model_key}")
model = AutoModelForSequenceClassification.from_pretrained(f"visolex/{model_key}")

# Example text
text = "Tôi rất vui vì hôm nay trời đẹp!"

# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

# Predict
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=-1).item()

# Map to emotion name
emotion_map = {{
    0: "Enjoyment",
    1: "Sadness",
    2: "Anger",
    3: "Fear",
    4: "Disgust",
    5: "Surprise",
    6: "Other"
}}

predicted_emotion = emotion_map[predicted_class]
print(f"Text: {{text}}")
print(f"Predicted emotion: {{predicted_emotion}}")

Citation

If you use this model, please cite:

@misc{{visolex_emotion_{model_key},
  title={{ {description} for Vietnamese Emotion Recognition}},
  author={{ViSoLex Team}},
  year={{2024}},
  url={{https://huggingface.co/visolex/{model_key}}}
}}

License

This model is released under the Apache-2.0 license.

Acknowledgments

Base model: {base_model}
Dataset: VSMEC (Vietnamese Social Media Emotion Corpus)
ViSoLex Toolkit

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for visolex/emotion-mbert

Base model

google-bert/bert-base-multilingual-cased

Finetuned

(890)

this model

Evaluation results

accuracy on VSMEC
self-reported

0.545
macro-f1 on VSMEC
self-reported

0.506

View on Papers With Code