--- language: "en" tags: - fill-mask license: mit --- # Health History BERTimbau-pt-ft The HealthHistoryBERTimbau-pt-ft was fine-tuned on the pre-trained model [HealthHistoryBERTimbau-pt](https://huggingface.co/efbaro/HealthHistoryBERTimbau-pt) and with patient data from health insurances organized in the form of historical sentences. The initial objective of the training was to predict hospitalizations, however, due to the possibility of applications in other tasks, we made these models available to the scientific community. This model was trained with Portuguese Health Insurance Data. There are also other training approaches that can be seen at: ### Other pre-trained models * [HealthHistoryRoBERTa-en](https://huggingface.co/efbaro/HealthHistoryRoBERTa-en) * [HealthHistoryRoBERTa-pt](https://huggingface.co/efbaro/HealthHistoryRoBERTa-pt) * [HealthHistoryBERT-en](https://huggingface.co/efbaro/HealthHistoryBERT-en) * [HealthHistoryBioBERT-en](https://huggingface.co/efbaro/HealthHistoryBioBERT-en) * [HealthHistoryBio_ClinicalBERT-en](https://huggingface.co/efbaro/HealthHistoryBio_ClinicalBERT-en) * [HealthHistoryBERTimbau-pt](https://huggingface.co/efbaro/HealthHistoryBERTimbau-pt) ### Other Models trained to predict hospitalizations (fine-tune) * [HealthHistoryOpenLLaMA3Bv2-en-ft](https://huggingface.co/efbaro/HealthHistoryOpenLLaMA3Bv2-en-ft) * [HealthHistoryOpenLLaMA7Bv2-en-ft](https://huggingface.co/efbaro/HealthHistoryOpenLLaMA7Bv2-en-ft) * [HealthHistoryOpenLLaMA13B-en-ft](https://huggingface.co/efbaro/HealthHistoryOpenLLaMA13B-en-ft) * [HealthHistoryOpenCabrita3B-pt-ft](https://huggingface.co/efbaro/HealthHistoryOpenCabrita3B-pt-ft) * [HealthHistoryRoBERTa-en-ft](https://huggingface.co/efbaro/HealthHistoryRoBERTa-en-ft) * [HealthHistoryRoBERTa-pt-ft](https://huggingface.co/efbaro/HealthHistoryRoBERTa-pt-ft) * [HealthHistoryBERTimbau-pt-ft](https://huggingface.co/efbaro/HealthHistoryBERTimbau-pt-ft) * [HealthHistoryBERT-en-ft](https://huggingface.co/efbaro/HealthHistoryBERT-en-ft) * [HealthHistoryBioBERT-en-ft](https://huggingface.co/efbaro/HealthHistoryBioBERT-en-ft) * [HealthHistoryBio_ClinicalBERT-en-ft](https://huggingface.co/efbaro/HealthHistoryBio_ClinicalBERT-en-ft) ## Fine-tune Data The model was fine-tuned from 83,715 historical sentences from health insurance patients generated using the approach described in this paper [Predicting Hospitalization from Health Insurance Data](https://ieeexplore.ieee.org/document/9945601). ## Model Fine-tune ### Fine-tune Procedures The model was fine-tuned on a GeForce NVIDIA RTX A5000 24GB GPU from laboratories of [IT departament at UFPR (Federal University of ParanĂ¡)](https://web.inf.ufpr.br/dinf/). ### Fine-tune Hyperparameters We use a batch size of 16, a maximum sequence length of 512 tokens, accumulation steps of 4, number of epochs = 2 and a learning rate of 10−4 to fine-tune this model. ### Fine-tune time The training time was 5 hours 26 minutes per epoch. ### Time to predict Time to predict the first 500 sentences of dataset data_test_seed_pt_12.csv: 2.44 seconds Time to predict the first 500 sentences + data tokenization of data_test_seed_pt_12.csv: 6.35 seconds Predictions made with the maximum sentence length allowed by the models. ## How to use the model Load the model via the transformers library: ```python from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("efbaro/HealthHistoryBERTimbau-pt-ft") model = AutoModel.from_pretrained("efbaro/HealthHistoryBERTimbau-pt-ft") ``` ## More Information Refer to the original paper, [Predicting Hospitalization with LLMs from Health Insurance Data](https://link.springer.com/article/10.1007/s11517-024-03251-4) Refert to another article related to this research, [Predicting Hospitalization from Health Insurance Data](https://ieeexplore.ieee.org/document/9945601) ## Questions? Email: - Everton F. Baro: efbaro@inf.ufpr.br, everton.barros@ifpr.edu.br - Luiz S. Oliveira: luiz.oliveira@ufpr.br - Alceu de Souza Britto Junior: alceu.junior@pucpr.br