Llama-PLLuM-8B-instruct-ArtexIT-reasoning
Built with Llama
This repository contains a GRPO fine‑tune of [CYFRAGOVPL/Llama-PLLuM-8B-instruct] trained on GSM8K (MIT).
We publish both Hugging Face (safetensors) and GGUF artifacts (Q8_0, Q5_K_M) for use with llama.cpp.
What is this?
- Base: Meta Llama 3.1 → PLLuM 8B Instruct (Polish) → GRPO fine‑tune (math / word problems).
- Context: ~131k (based on GGUF header).
- Message format: Llama
[INST] ... [/INST]+ explicit reasoning / answer tags (see below). - Default chat template: The tokenizer includes a default system instruction enforcing the two‑block format.
Prompt format
The model expects Llama chat formatting and supports explicit tags:
- Reasoning:
<think> ... </think> - Final answer:
<answer> ... </answer>
Example
[INST] Rozwiąż: 12 * 13 = ? [/INST]
<think>12*13 = 156.</think>
<answer>156</answer>
Quickstart
Transformers (PyTorch)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning"
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")
prompt = tok.apply_chat_template(
[{"role": "user", "content": "Podaj 3 miasta w Polsce."}],
add_generation_prompt=True,
tokenize=False,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=False))
Training (brief)
- Method: GRPO (policy‑gradient reinforcement learning with multiple reward functions).
- Data:
openai/gsm8k— License: MIT. - Goal: consistent two‑block outputs (reasoning + final answer) using the training tags.
License & Attribution
This repository contains derivatives of Llama 3.1 and PLLuM:
- Llama 3.1 Community License applies. When redistributing, you must:
- include a copy of the license and prominently display “Built with Llama”,
- include “Llama” at the beginning of any distributed model’s name if it was created, trained or fine‑tuned using Llama materials,
- keep a NOTICE file with the following line:
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. - comply with the Acceptable Use Policy (AUP).
- PLLuM: please cite the PLLuM work (see Citation below).
- Data: GSM8K is MIT‑licensed; include dataset attribution.
This repo includes:
LICENSE— full text of the Llama 3.1 Community LicenseUSE_POLICY.md— pointer to the official Acceptable Use PolicyNOTICE— required Llama attribution line
If your (or your affiliates’) products exceeded 700M monthly active users on the Llama 3.1 release date, you must obtain a separate license from Meta before exercising the rights in the Llama 3.1 license.
Citation
If you use PLLuM in research or deployments, please cite:
@unpublished{pllum2025,
title={PLLuM: A Family of Polish Large Language Models},
author={PLLuM Consortium},
year={2025}
}
- Downloads last month
- 3