XLM-RoBERTa Large Fine-tuned for Spanish Sentiment Analysis (TASS)

Model Description

This model is a fine-tuned version of FacebookAI/xlm-roberta-large for sentiment analysis on Spanish Twitter data (TASS dataset).

Training Details

  • Base Model: xlm-roberta-large
  • Task: Multi-class Sentiment Classification (Negative/Neutral/Positive)
  • Dataset: TASS (Twitter Analysis Sentiment Seminar)
  • Training Samples: 4,636
  • Validation Samples: 1,159
  • Test Samples: 1,449
  • Batch Size: 16
  • Epochs: 10
  • Learning Rate: 2e-05
  • Weight Decay: 0.01
  • Max Sequence Length: 128
  • Class Balancing: Weighted Cross-Entropy Loss
  • Early Stopping: Enabled (patience=3)

Performance (Test Set)

Metric Score
F1 Score 0.4098
Accuracy 0.5079
Precision 0.5283
Recall 0.5079

Training History (Validation Set)

Metrics per epoch during training:

Epoch Loss Accuracy F1 Score Precision Recall
1 1.1272 0.3819 0.3235 0.4629 0.3819
2 1.0278 0.4931 0.3947 0.3591 0.4931
3 1.0981 0.3991 0.2382 0.3610 0.3991
4 1.0494 0.4440 0.3561 0.3265 0.4440
5 1.0650 0.4405 0.3174 0.3696 0.4405

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Cargar modelo y tokenizer
tokenizer = AutoTokenizer.from_pretrained("tu-usuario/xlm-roberta-large-tass-sentiment-bs16")
model = AutoModelForSequenceClassification.from_pretrained("tu-usuario/xlm-roberta-large-tass-sentiment-bs16")

# Ejemplo de uso
text = "Me encanta este producto, es excelente"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

labels = {0: "Negativo", 1: "Neutral", 2: "Positivo"}
print(f"Sentimiento: {labels[predicted_class]}")
print(f"Confianza: {predictions[0][predicted_class].item():.4f}")

Como Pipeline

from transformers import pipeline

# Usar como pipeline
classifier = pipeline('sentiment-analysis', model='tu-usuario/xlm-roberta-large-tass-sentiment-bs16')

result = classifier("Me encanta este producto, es excelente")
print(result)
# Output: [{'label': 'LABEL_2', 'score': 0.95}]
# LABEL_0 = Negativo, LABEL_1 = Neutral, LABEL_2 = Positivo

Labels

  • 0 (LABEL_0): Negative sentiment
  • 1 (LABEL_1): Neutral sentiment
  • 2 (LABEL_2): Positive sentiment

Training Configuration

The model was trained with weighted loss to handle class imbalance.

Distribution in training set (estimated):

  • Negative samples: 1854 (40%)
  • Neutral samples: 1391 (30%)
  • Positive samples: 1391 (30%)

Limitations and Bias

  • This model is specifically trained on Spanish Twitter data
  • Performance may vary on other Spanish text domains
  • The model classifies sentiment into three categories (negative, neutral, positive)
  • May reflect biases present in the TASS Twitter dataset

Citation

If you use this model, please cite:

@misc{xlm-roberta-tass-sentiment,
  author = {Your Name},
  title = {XLM-RoBERTa Large Fine-tuned for Spanish Sentiment Analysis},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/tu-usuario/xlm-roberta-large-tass-sentiment-bs16}}
}
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results