Fine-Tuned Llama-3-8B for Logic Puzzle Generation & Solving

This repository contains a fine-tuned version of unsloth/llama-3-8b-instruct, specialized for generating and solving complex logic puzzles in a strict JSON format.

This model was developed as part of the Synthetic Data AI Agent System for the AMD x Pytorch xUnsloth Hackathon. It acts as the "Student" in a Teacher-Student architecture, where it was trained on a high-quality synthetic dataset generated by a larger "Teacher" model (Unsloth/Llama-3.3-70B-Instruct).

The model is optimized for high-throughput inference using the Unsloth library.

Model Details

Base Model: unsloth/llama-3-8b-instruct
Fine-tuning Dataset: A custom synthetic dataset of 1,038 logic puzzles. [Thunderbird2410/amd-hack-qa]
Training Hardware: 1x AMD Instinct™ MI300X GPU (192 GB HBM3)
Intended Use: This model is designed to be a fast and reliable AI agent for generating and answering multiple-choice questions on topics like Seating Arrangements and Blood Relations.

How to Use

You can use this model directly with the unsloth library for optimal performance on AMD GPUs (ROCm).
No CUDA configuration is required — PyTorch automatically detects the ROCm device.

import torch
from unsloth import FastLanguageModel

# Load the fine-tuned model
model_name = "Thunderbird2410/Llama-3-8B-Puzzles-Unsloth"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = 4096,
    dtype = torch.bfloat16,
    device_map = "auto",  # Automatically uses ROCm on AMD GPUs
)

tokenizer.padding_side = "left"
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({"pad_token": tokenizer.eos_token})
    model.resize_token_embeddings(len(tokenizer))

prompt = """
Generate a hard MCQ-based question as well as its 4 choices and answer on the topic "Number Series".
Return your response as a valid JSON object with this exact structure:
{
    "topic": "Number Series",
    "question": "Your question here?",
    "choices": [
        "A) First option",
        "B) Second option",
        "C) Third option",
        "D) Fourth option"
    ],
    "answer": "A",
    "explanation": "Brief explanation for why the correct answer is right."
}
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens = 512,
    temperature = 0.1,
    top_p = 0.9,
    do_sample = True,
)

response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

Training Procedure

The model was fine-tuned for a single epoch on a dataset of 1,038 high-quality examples. The key hyperparameters were:

LoRA Rank (r): 16
LoRA Alpha: 32
Learning Rate: 1e-4
Effective Batch Size: 1024 (64 per-device * 16 gradient accumulation)

This aggressive configuration was made possible by the large VRAM of the AMD MI300X GPU.

Citation

If you use this model, please cite the original project repository and base model.

@misc{thunderbird2410_llama3_puzzles_amd_2025,
  author = {Thunderbird2410},
  title = {Synthetic Data AI Agent System for the AMD × PyTorch × Unsloth Hackathon},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Thunderbird2410/Llama-3-8B-Puzzles-Unsloth}},
}

Downloads last month: 392

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Thunderbird2410
/

Llama-3-8B-Puzzles-Unsloth

Fine-Tuned Llama-3-8B for Logic Puzzle Generation & Solving

Model Details

How to Use

Training Procedure

Citation

Dataset used to train Thunderbird2410/Llama-3-8B-Puzzles-Unsloth