Qwen2.5-0.5B finetuned for proficiency in Portuguese language and increased intelligence.
https://ollama.com/cnmoro/Qwen2.5-0.5B-Portuguese-v2
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "cnmoro/Qwen2.5-0.5B-Portuguese-v2"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Escreva uma breve introdução sobre LLMs (Large Language Models) e suas aplicações."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
response
Overall Results
| Task |
Metric |
Value |
StdErr |
| ASSIN2 RTE |
F1 Macro |
0.4486 |
0.0067 |
| ASSIN2 RTE |
Accuracy |
0.5560 |
0.0071 |
| ASSIN2 STS |
Pearson |
0.4091 |
0.0104 |
| ASSIN2 STS |
MSE |
5.6395 |
N/A |
| BluEX |
Accuracy |
0.2503 |
0.0094 |
| ENEM Challenge |
Accuracy |
0.3128 |
0.0071 |
| FAQUAD NLI |
F1 Macro |
0.4611 |
0.0094 |
| FAQUAD NLI |
Accuracy |
0.7877 |
0.0113 |
| HateBR Offensive (Binary) |
F1 Macro |
0.3439 |
0.0049 |
| HateBR Offensive (Binary) |
Accuracy |
0.4857 |
0.0095 |
| OAB Exams |
Accuracy |
0.3062 |
0.0057 |
| Portuguese Hate Speech (Binary) |
F1 Macro |
0.4119 |
0.0038 |
| Portuguese Hate Speech (Binary) |
Accuracy |
0.7004 |
0.0111 |
| TweetSentBR |
F1 Macro |
0.5055 |
0.0078 |
| TweetSentBR |
Accuracy |
0.5697 |
0.0078 |
Detailed Results by Task
ASSIN2 RTE
| Metric |
Value |
StdErr |
| F1 Macro |
0.4486 |
0.0067 |
| Accuracy |
0.5560 |
0.0071 |
ASSIN2 STS
| Metric |
Value |
StdErr |
| Pearson |
0.4091 |
0.0104 |
| MSE |
5.6395 |
N/A |
BluEX
| Exam ID |
Metric |
Value |
StdErr |
| All |
Accuracy |
0.2503 |
0.0094 |
| USP_2018 |
Accuracy |
0.2037 |
0.0315 |
| UNICAMP_2018 |
Accuracy |
0.1852 |
0.0306 |
| UNICAMP_2021_1 |
Accuracy |
0.0870 |
0.0240 |
| USP_2020 |
Accuracy |
0.2143 |
0.0317 |
| USP_2023 |
Accuracy |
0.2045 |
0.0350 |
| UNICAMP_2019 |
Accuracy |
0.2600 |
0.0358 |
| USP_2019 |
Accuracy |
0.1500 |
0.0326 |
| UNICAMP_2020 |
Accuracy |
0.2182 |
0.0321 |
| UNICAMP_2021_2 |
Accuracy |
0.2941 |
0.0367 |
| UNICAMP_2023 |
Accuracy |
0.4186 |
0.0433 |
| UNICAMP_2024 |
Accuracy |
0.3111 |
0.0398 |
| USP_2024 |
Accuracy |
0.2683 |
0.0398 |
| USP_2021 |
Accuracy |
0.3269 |
0.0375 |
| UNICAMP_2022 |
Accuracy |
0.3590 |
0.0444 |
| USP_2022 |
Accuracy |
0.2857 |
0.0370 |
ENEM Challenge
| Exam ID |
Metric |
Value |
StdErr |
| All |
Accuracy |
0.3128 |
0.0071 |
| 2017 |
Accuracy |
0.2845 |
0.0241 |
| 2016 |
Accuracy |
0.2479 |
0.0226 |
| 2016_2 |
Accuracy |
0.2846 |
0.0235 |
| 2022 |
Accuracy |
0.3534 |
0.0240 |
| 2012 |
Accuracy |
0.3362 |
0.0253 |
| 2011 |
Accuracy |
0.3333 |
0.0251 |
| 2010 |
Accuracy |
0.3846 |
0.0260 |
| 2014 |
Accuracy |
0.3211 |
0.0259 |
| 2009 |
Accuracy |
0.2696 |
0.0239 |
| 2015 |
Accuracy |
0.2521 |
0.0229 |
| 2023 |
Accuracy |
0.3481 |
0.0236 |
| 2013 |
Accuracy |
0.3333 |
0.0261 |
FAQUAD NLI
| Metric |
Value |
StdErr |
| F1 Macro |
0.4611 |
0.0094 |
| Accuracy |
0.7877 |
0.0113 |
HateBR Offensive (Binary)
| Metric |
Value |
StdErr |
| F1 Macro |
0.3439 |
0.0049 |
| Accuracy |
0.4857 |
0.0095 |
OAB Exams
| Exam ID |
Metric |
Value |
StdErr |
| All |
Accuracy |
0.3062 |
0.0057 |
| 2011-05 |
Accuracy |
0.3375 |
0.0304 |
| 2012-06a |
Accuracy |
0.2625 |
0.0285 |
| 2010-02 |
Accuracy |
0.3700 |
0.0279 |
| 2017-22 |
Accuracy |
0.3500 |
0.0309 |
| 2016-20 |
Accuracy |
0.3125 |
0.0300 |
| 2011-03 |
Accuracy |
0.2626 |
0.0255 |
| 2015-17 |
Accuracy |
0.3205 |
0.0304 |
| 2017-23 |
Accuracy |
0.2875 |
0.0292 |
| 2018-25 |
Accuracy |
0.3625 |
0.0311 |
| 2016-19 |
Accuracy |
0.2436 |
0.0281 |
| 2017-24 |
Accuracy |
0.1625 |
0.0238 |
| 2015-16 |
Accuracy |
0.3125 |
0.0300 |
| 2011-04 |
Accuracy |
0.3250 |
0.0301 |
| 2012-07 |
Accuracy |
0.3500 |
0.0307 |
| 2012-06 |
Accuracy |
0.1875 |
0.0253 |
| 2012-09 |
Accuracy |
0.2468 |
0.0284 |
| 2013-12 |
Accuracy |
0.3625 |
0.0311 |
| 2013-11 |
Accuracy |
0.3000 |
0.0295 |
| 2010-01 |
Accuracy |
0.3412 |
0.0296 |
| 2015-18 |
Accuracy |
0.2875 |
0.0292 |
| 2014-13 |
Accuracy |
0.3500 |
0.0308 |
| 2013-10 |
Accuracy |
0.3125 |
0.0300 |
| 2016-20a |
Accuracy |
0.2500 |
0.0279 |
| 2014-14 |
Accuracy |
0.3125 |
0.0301 |
| 2012-08 |
Accuracy |
0.3000 |
0.0296 |
| 2016-21 |
Accuracy |
0.3375 |
0.0304 |
| 2014-15 |
Accuracy |
0.4103 |
0.0321 |
Portuguese Hate Speech (Binary)
| Metric |
Value |
StdErr |
| F1 Macro |
0.4119 |
0.0038 |
| Accuracy |
0.7004 |
0.0111 |
TweetSentBR
| Metric |
Value |
StdErr |
| F1 Macro |
0.5055 |
0.0078 |
| Accuracy |
0.5697 |
0.0078 |
Open Portuguese LLM Leaderboard Evaluation Results
Detailed results can be found here and on the 🚀 Open Portuguese LLM Leaderboard
| Metric |
Value |
| Average |
45.81 |
| ENEM Challenge (No Images) |
36.81 |
| BLUEX (No Images) |
26.84 |
| OAB Exams |
30.62 |
| Assin2 RTE |
87.91 |
| Assin2 STS |
59.01 |
| FaQuAD NLI |
43.97 |
| HateBR Binary |
33.62 |
| PT Hate Speech Binary |
41.23 |
| tweetSentBR |
52.33 |