vzani
/

portuguese-fake-news-classifier-mlp-tfidf-faketrue-br

@@ -70,33 +70,29 @@ model-index:
 ## Model Overview
 This repository contains **MLP classifiers trained on TF-IDF features** for **fake news detection in Portuguese**.
-Models are trained and evaluated on corpora derived from Brazilian Portuguese datasets **[Fake.br](https://github.com/roneysco/Fake.br-Corpus)** and **[FakeTrue.Br](https://github.com/jpchav98/FakeTrue.Br/)**.
 - **Architecture**: Multi-Layer Perceptron (scikit-learn)
-- **Features**: TF-IDF over unigrams/bigrams
 - **Task**: Binary text classification (Fake vs. True)
 - **Language**: Portuguese (`pt`)
 - **Framework**: scikit-learn
 ---
 ## Available Variants
-- **mlp-tfidf-combined**
-  Trained on the aligned combined corpus.
-- **mlp-tfidf-fake-br**
-  Trained on **Fake.br**.
-- **mlp-tfidf-faketrue-br**
-  Trained on **FakeTrue.Br**.
-  Includes aligned splits and the original CSV when available.
-Each variant ships with:
-- `final_model.joblib`
-- `confusion_matrix.png`
-- `final_classification_report.parquet`
-- `final_predictions.parquet`
 ---
@@ -130,29 +126,19 @@ These files provide per-class performance and prediction logs for reproducibilit
 ---
-## Corpus
-The corpora used for training and evaluation are provided in the `corpus/` folder.
-- **Combined (root folder)**: `corpus_train_df.parquet`, `corpus_test_df.parquet`, `corpus_df.parquet`, `corpus_alinhado_df.parquet`.
-- **Fake.br**: `corpus_train_df.parquet`, `corpus_test_df.parquet`, `corpus_df.parquet`, `corpus_alinhado_df.parquet`.
-- **FakeTrue.Br**: `corpus_train_df.parquet`, `corpus_test_df.parquet`, `corpus_df.parquet`, `corpus_alinhado_df.parquet` and `FakeTrueBr_corpus.csv`.
----
 ## How to Use
-This model is a **Keras** model stored as `final_bilstm_model.keras`.
 ```python
 import joblib
 from huggingface_hub import hf_hub_download
-repo_id = "vzani/portuguese-fake-news-classifier-mlp-tfidf-combined"  # or fake-br / faketrue-br
 filename = "final_model.joblib"
 model_path = hf_hub_download(repo_id=repo_id, filename=filename)
-clf = joblib.load(model_path)  # Pipeline or bare estimator
 def predict(text: str) -> tuple[bool, float]:
@@ -175,10 +161,30 @@ The expected output is a Tuple where the first entry represents the classificati
 (False, 1.0)
 ```
 ## License
-[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ## Citation
-Coming soon.

 ## Model Overview
 This repository contains **MLP classifiers trained on TF-IDF features** for **fake news detection in Portuguese**.
+The model is trained and evaluated on corpora derived from Brazilian Portuguese dataset **[FakeTrue.Br](https://github.com/jpchav98/FakeTrue.Br/)**.
 - **Architecture**: Multi-Layer Perceptron (scikit-learn)
+- **Features**: TF-IDF over unigrams/bigrams/trigrams
 - **Task**: Binary text classification (Fake vs. True)
 - **Language**: Portuguese (`pt`)
 - **Framework**: scikit-learn
+- **Training source code**: https://github.com/viniciuszani/portuguese-fake-new-classifiers
 ---
 ## Available Variants
+- [**mlp-tfidf-combined**](https://huggingface.co/vzani/portuguese-fake-news-classifier-mlp-tfidf-combined)
+  Fine-tuned using the [combined dataset](https://huggingface.co/datasets/vzani/corpus-combined) from Fake.br and FakeTrue.Br.
+- [**mlp-tfidf-fake-br**](https://huggingface.co/vzani/portuguese-fake-news-classifier-mlp-tfidf-fake-br)
+  Fine-tuned using the [Fake.br dataset](https://huggingface.co/datasets/vzani/corpus-fake-br) from Fake.br.
+- [**mlp-tfidf-faketrue-br**](https://huggingface.co/vzani/portuguese-fake-news-classifier-mlp-tfidf-faketrue-br)
+  Fine-tuned using the [FakeTrue.Br dataset](https://huggingface.co/datasets/vzani/corpus-faketrue-br) from FakeTrue.Br.
+Each variant has its own confusion matrix, classification report, and predictions stored as artifacts.
 ---
 ---
 ## How to Use
+This model is stored as `final_model.joblib`.
 ```python
 import joblib
 from huggingface_hub import hf_hub_download
+repo_id = "vzani/portuguese-fake-news-classifier-mlp-tfidf-faketrue-br"  # or fake-br / combined
 filename = "final_model.joblib"
 model_path = hf_hub_download(repo_id=repo_id, filename=filename)
+clf = joblib.load(model_path)
 def predict(text: str) -> tuple[bool, float]:
 (False, 1.0)
 ```
+## Source code
+You can find the source code that produced this model in the repository below:
+- https://github.com/viniciuszani/portuguese-fake-new-classifiers
+The source contains all the steps from data collection, evaluation, hyperparameter fine tuning, final model tuning and publishing to HuggingFace.
+If you use it, please remember to credit the author and/or cite the work.
 ## License
+- Base model BERTimbau: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+- Fine-tuned models and corpora: Released under the same license for academic and research use.
 ## Citation
+```bibtex
+@misc{zani2025portuguesefakenews,
+  author       = {ZANI, Vinícius Augusto Tagliatti},
+  title        = {Avaliação comparativa de técnicas de processamento de linguagem natural para a detecção de notícias falsas em Português},
+  year         = {2025},
+  pages        = {61},
+  address      = {São Carlos},
+  school       = {Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo},
+  type         = {Trabalho de Conclusão de Curso (MBA em Inteligência Artificial e Big Data)},
+  note         = {Orientador: Prof. Dr. Ivandre Paraboni}
+}
+```