Fine-Tuned Llama-3-8B for Logic Puzzle Generation & Solving
This repository contains a fine-tuned version of unsloth/llama-3-8b-instruct, specialized for generating and solving complex logic puzzles in a strict JSON format.
This model was developed as part of the Synthetic Data AI Agent System for the AMD x Pytorch xUnsloth Hackathon. It acts as the "Student" in a Teacher-Student architecture, where it was trained on a high-quality synthetic dataset generated by a larger "Teacher" model (Unsloth/Llama-3.3-70B-Instruct).
The model is optimized for high-throughput inference using the Unsloth library.
Model Details
- Base Model:
unsloth/llama-3-8b-instruct - Fine-tuning Dataset: A custom synthetic dataset of 1,038 logic puzzles. [Thunderbird2410/amd-hack-qa]
- Training Hardware: 1x AMD Instinct™ MI300X GPU (192 GB HBM3)
- Intended Use: This model is designed to be a fast and reliable AI agent for generating and answering multiple-choice questions on topics like
Seating ArrangementsandBlood Relations.
How to Use
You can use this model directly with the unsloth library for optimal performance on AMD GPUs (ROCm).
No CUDA configuration is required — PyTorch automatically detects the ROCm device.
import torch
from unsloth import FastLanguageModel
# Load the fine-tuned model
model_name = "Thunderbird2410/Llama-3-8B-Puzzles-Unsloth"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = 4096,
dtype = torch.bfloat16,
device_map = "auto", # Automatically uses ROCm on AMD GPUs
)
tokenizer.padding_side = "left"
if tokenizer.pad_token is None:
tokenizer.add_special_tokens({"pad_token": tokenizer.eos_token})
model.resize_token_embeddings(len(tokenizer))
prompt = """
Generate a hard MCQ-based question as well as its 4 choices and answer on the topic "Number Series".
Return your response as a valid JSON object with this exact structure:
{
"topic": "Number Series",
"question": "Your question here?",
"choices": [
"A) First option",
"B) Second option",
"C) Third option",
"D) Fourth option"
],
"answer": "A",
"explanation": "Brief explanation for why the correct answer is right."
}
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens = 512,
temperature = 0.1,
top_p = 0.9,
do_sample = True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)
Training Procedure
The model was fine-tuned for a single epoch on a dataset of 1,038 high-quality examples. The key hyperparameters were:
- LoRA Rank (
r): 16 - LoRA Alpha: 32
- Learning Rate:
1e-4 - Effective Batch Size: 1024 (64 per-device * 16 gradient accumulation)
This aggressive configuration was made possible by the large VRAM of the AMD MI300X GPU.
Citation
If you use this model, please cite the original project repository and base model.
@misc{thunderbird2410_llama3_puzzles_amd_2025,
author = {Thunderbird2410},
title = {Synthetic Data AI Agent System for the AMD × PyTorch × Unsloth Hackathon},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Thunderbird2410/Llama-3-8B-Puzzles-Unsloth}},
}
- Downloads last month
- 392