XLM-RoBERTa Large Fine-tuned for Spanish Sentiment Analysis (TASS)

Model Description

This model is a fine-tuned version of FacebookAI/xlm-roberta-large for sentiment analysis on Spanish Twitter data (TASS dataset).

Training Details

Base Model: xlm-roberta-large
Task: Multi-class Sentiment Classification (Negative/Neutral/Positive)
Dataset: TASS (Twitter Analysis Sentiment Seminar)
Training Samples: 4,636
Validation Samples: 1,159
Test Samples: 1,449
Batch Size: 16
Epochs: 10
Learning Rate: 2e-05
Weight Decay: 0.01
Max Sequence Length: 128
Class Balancing: Weighted Cross-Entropy Loss
Early Stopping: Enabled (patience=3)

Performance (Test Set)

Metric	Score
F1 Score	0.4098
Accuracy	0.5079
Precision	0.5283
Recall	0.5079

Training History (Validation Set)

Metrics per epoch during training:

Epoch	Loss	Accuracy	F1 Score	Precision	Recall
1	1.1272	0.3819	0.3235	0.4629	0.3819
2	1.0278	0.4931	0.3947	0.3591	0.4931
3	1.0981	0.3991	0.2382	0.3610	0.3991
4	1.0494	0.4440	0.3561	0.3265	0.4440
5	1.0650	0.4405	0.3174	0.3696	0.4405

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Cargar modelo y tokenizer
tokenizer = AutoTokenizer.from_pretrained("tu-usuario/xlm-roberta-large-tass-sentiment-bs16")
model = AutoModelForSequenceClassification.from_pretrained("tu-usuario/xlm-roberta-large-tass-sentiment-bs16")

# Ejemplo de uso
text = "Me encanta este producto, es excelente"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

labels = {0: "Negativo", 1: "Neutral", 2: "Positivo"}
print(f"Sentimiento: {labels[predicted_class]}")
print(f"Confianza: {predictions[0][predicted_class].item():.4f}")

Como Pipeline

from transformers import pipeline

# Usar como pipeline
classifier = pipeline('sentiment-analysis', model='tu-usuario/xlm-roberta-large-tass-sentiment-bs16')

result = classifier("Me encanta este producto, es excelente")
print(result)
# Output: [{'label': 'LABEL_2', 'score': 0.95}]
# LABEL_0 = Negativo, LABEL_1 = Neutral, LABEL_2 = Positivo

Labels

0 (LABEL_0): Negative sentiment
1 (LABEL_1): Neutral sentiment
2 (LABEL_2): Positive sentiment

Training Configuration

The model was trained with weighted loss to handle class imbalance.

Distribution in training set (estimated):

Negative samples: ~~1854 (~~40%)
Neutral samples: ~~1391 (~~30%)
Positive samples: ~~1391 (~~30%)

Limitations and Bias

This model is specifically trained on Spanish Twitter data
Performance may vary on other Spanish text domains
The model classifies sentiment into three categories (negative, neutral, positive)
May reflect biases present in the TASS Twitter dataset

Citation

If you use this model, please cite:

@misc{xlm-roberta-tass-sentiment,
  author = {Your Name},
  title = {XLM-RoBERTa Large Fine-tuned for Spanish Sentiment Analysis},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/tu-usuario/xlm-roberta-large-tass-sentiment-bs16}}
}

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

F1 Score on TASS (Spanish Twitter)
self-reported

0.410
Accuracy on TASS (Spanish Twitter)
self-reported

0.508
Precision on TASS (Spanish Twitter)
self-reported

0.528
Recall on TASS (Spanish Twitter)
self-reported

0.508

View on Papers With Code