--- datasets: - stanfordnlp/imdb language: - en metrics: - accuracy - precision - recall - f1 base_model: - facebook/bart-base - google-bert/bert-base-uncased - EleutherAI/gpt-neo-2.7B pipeline_tag: text-classification license: apache-2.0 --- # ๐Ÿ“ Model Card: ensemble-majority-voting-imdb ## ๐Ÿ” Introduction The `wakaflocka17/ensemble-majority-voting-imdb` model is a majority-voting ensemble of three fine-tuned sentiment classifiers (`bert-imdb-finetuned`, `bart-imdb-finetuned`, `gptneo-imdb-finetuned`) on the IMDb dataset. Each model votes on the sentiment label and the ensemble returns the label with the most votes, improving overall accuracy. ## ๐Ÿ“Š Evaluation Metrics | Metric | Value | |-----------|---------| | Accuracy | 0.93296 | | Precision | 0.9559 | | Recall | 0.9078 | | F1-score | 0.9312 | ## โš™๏ธ Training Parameters | Parameter | Values | |-----------------------|--------------------------------------------------| | Models in ensemble | `bert_base_uncased`, `bart_base`, `gpt_neo_2_7b` | | Repo for ensemble | `models/ensemble_majority_voting` | | Batch size (eval) | 64 | ## ๐Ÿš€ Example of use in Colab #### Installing dependencies ```bash !pip install --upgrade transformers huggingface_hub ``` #### (Optional) Authentication for private models ```python from huggingface_hub import login login(token="hf_yourhftoken") ``` #### Loading models and creating ensemble pipeline ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline from collections import Counter # List of fine-tuned model repo IDs model_ids = [ "wakaflocka17/bert-imdb-finetuned", "wakaflocka17/bart-imdb-finetuned", "wakaflocka17/gptneo-imdb-finetuned" ] ``` #### Load pipelines ```python pipelines = [] for repo_id in model_ids: tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForSequenceClassification.from_pretrained(repo_id) model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'} pipelines.append(TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)) ``` #### Ensemble prediction function ```python def ensemble_predict(text): votes = [] # Collect each model's vote along with its name for model_id, pipe in zip(model_ids, pipelines): label = pipe(text)[0]['label'] votes.append({ "model": model_id, # or model_id.split("/")[-1] for just the short name "label": label }) # Determine majority label majority_label = Counter([v["label"] for v in votes]).most_common(1)[0][0] return { "ensemble_label": majority_label, "individual_votes": votes } ``` #### Inference on a text example ```python testo = "This movie was absolutely fantasticโ€”wonderful performances and a gripping story!" result = ensemble_predict(testo) print(result) # Example output: # { # 'ensemble_label': 'POSITIVE', # 'individual_votes': [ # {'model': 'wakaflocka17/bert-imdb-finetuned', 'label': 'POSITIVE'}, # {'model': 'wakaflocka17/bart-imdb-finetuned', 'label': 'NEGATIVE'}, # {'model': 'wakaflocka17/gptneo-imdb-finetuned', 'label': 'POSITIVE'} # ] # } ``` ## ๐Ÿ“– How to cite If you use this model in your work, you can cite it as: ```latex @misc{Sentiment-Project, author = {Francesco Congiu}, title = {Sentiment Analysis with Pretrained, Fine-tuned and Ensemble Transformer Models}, howpublished = {\url{https://github.com/wakaflocka17/DLA_LLMSANALYSIS}}, year = {2025} } ``` ## ๐Ÿ”— Reference Repository > All the file structure and script examples can be found at: > https://github.com/wakaflocka17/DLA_LLMSANALYSIS/tree/main