Upload README.md with huggingface_hub

fb1f64d verified 16 days ago

4.21 kB


	---
	language: es
	tags:
	- sentiment-analysis
	- spanish
	- xlm-roberta
	- tass
	- twitter
	datasets:
	- TASS
	metrics:
	- f1
	- accuracy
	- precision
	- recall
	model-index:
	- name: xlm-roberta-large-tass-sentiment-bs16
	results:
	- task:
	type: text-classification
	name: Sentiment Analysis
	dataset:
	name: TASS (Spanish Twitter)
	type: tass
	metrics:
	- type: f1
	value: 0.4098
	name: F1 Score
	- type: accuracy
	value: 0.5079
	name: Accuracy
	- type: precision
	value: 0.5283
	name: Precision
	- type: recall
	value: 0.5079
	name: Recall
	---

	# XLM-RoBERTa Large Fine-tuned for Spanish Sentiment Analysis (TASS)

	## Model Description

	This model is a fine-tuned version of [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large)
	for sentiment analysis on Spanish Twitter data (TASS dataset).

	## Training Details

	- Base Model: xlm-roberta-large
	- Task: Multi-class Sentiment Classification (Negative/Neutral/Positive)
	- Dataset: TASS (Twitter Analysis Sentiment Seminar)
	- Training Samples: 4,636
	- Validation Samples: 1,159
	- Test Samples: 1,449
	- Batch Size: 16
	- Epochs: 10
	- Learning Rate: 2e-05
	- Weight Decay: 0.01
	- Max Sequence Length: 128
	- Class Balancing: Weighted Cross-Entropy Loss
	- Early Stopping: Enabled (patience=3)

	## Performance (Test Set)

	\| Metric \| Score \|
	\|--------\|-------\|
	\| F1 Score \| 0.4098 \|
	\| Accuracy \| 0.5079 \|
	\| Precision \| 0.5283 \|
	\| Recall \| 0.5079 \|

	## Training History (Validation Set)

	Metrics per epoch during training:

	\| Epoch \| Loss \| Accuracy \| F1 Score \| Precision \| Recall \|
	\|-------\|------\|----------\|----------\|-----------\|--------\|
	\| 1 \| 1.1272 \| 0.3819 \| 0.3235 \| 0.4629 \| 0.3819 \|
	\| 2 \| 1.0278 \| 0.4931 \| 0.3947 \| 0.3591 \| 0.4931 \|
	\| 3 \| 1.0981 \| 0.3991 \| 0.2382 \| 0.3610 \| 0.3991 \|
	\| 4 \| 1.0494 \| 0.4440 \| 0.3561 \| 0.3265 \| 0.4440 \|
	\| 5 \| 1.0650 \| 0.4405 \| 0.3174 \| 0.3696 \| 0.4405 \|


	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Cargar modelo y tokenizer
	tokenizer = AutoTokenizer.from_pretrained("tu-usuario/xlm-roberta-large-tass-sentiment-bs16")
	model = AutoModelForSequenceClassification.from_pretrained("tu-usuario/xlm-roberta-large-tass-sentiment-bs16")

	# Ejemplo de uso
	text = "Me encanta este producto, es excelente"
	inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256)

	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class = torch.argmax(predictions, dim=-1).item()

	labels = {0: "Negativo", 1: "Neutral", 2: "Positivo"}
	print(f"Sentimiento: {labels[predicted_class]}")
	print(f"Confianza: {predictions[0][predicted_class].item():.4f}")
	```

	### Como Pipeline

	```python
	from transformers import pipeline

	# Usar como pipeline
	classifier = pipeline('sentiment-analysis', model='tu-usuario/xlm-roberta-large-tass-sentiment-bs16')

	result = classifier("Me encanta este producto, es excelente")
	print(result)
	# Output: [{'label': 'LABEL_2', 'score': 0.95}]
	# LABEL_0 = Negativo, LABEL_1 = Neutral, LABEL_2 = Positivo
	```

	## Labels

	- `0` (LABEL_0): Negative sentiment
	- `1` (LABEL_1): Neutral sentiment
	- `2` (LABEL_2): Positive sentiment

	## Training Configuration

	The model was trained with weighted loss to handle class imbalance.

	Distribution in training set (estimated):
	- Negative samples: ~1854 (~40%)
	- Neutral samples: ~1391 (~30%)
	- Positive samples: ~1391 (~30%)

	## Limitations and Bias

	- This model is specifically trained on Spanish Twitter data
	- Performance may vary on other Spanish text domains
	- The model classifies sentiment into three categories (negative, neutral, positive)
	- May reflect biases present in the TASS Twitter dataset

	## Citation

	If you use this model, please cite:

	```
	@misc{xlm-roberta-tass-sentiment,
	author = {Your Name},
	title = {XLM-RoBERTa Large Fine-tuned for Spanish Sentiment Analysis},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/tu-usuario/xlm-roberta-large-tass-sentiment-bs16}}
	}
	```