SPIRAL Qwen3-8B Multi-Agent Model
This model was trained using the SPIRAL (Self-Play Iterative Reinforcement learning for Adaptation and Learning) framework.
Model Details
- Base Model: Qwen/Qwen3-8B-Base
 - Training Framework: SPIRAL
 - Checkpoint: step_00256
 - Model Size: 8B parameters
 - Training Date: 2025-09-11
 
Training Configuration
The model was trained with self-play on multiple environments:
- KuhnPoker-v1
 - TicTacToe-v0
 - SimpleNegotiation-v1
 
Training Parameters
{
  "learning_rate": "1e-6",
  "train_batch_size": 128,
  "num_ppo_epochs": 2,
  "temperature": 1.0,
  "max_model_len": 16384,
  "environments": [
    "KuhnPoker-v1",
    "TicTacToe-v0",
    "SimpleNegotiation-v1"
  ],
  "base_model": "Qwen/Qwen3-8B-Base",
  "framework": "SPIRAL"
}
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("the-acorn-ai/spiral-qwen3-4b-simple-negotiation-step00256")
model = AutoModelForCausalLM.from_pretrained(
    "the-acorn-ai/spiral-qwen3-4b-simple-negotiation-step00256",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
# Generate text
inputs = tokenizer("Your prompt here", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
License
This model is licensed under the Apache License 2.0.
- Downloads last month
 - 12
 
Model tree for the-acorn-ai/spiral-qwen3-4b-simple-negotiation-step00256
Base model
Qwen/Qwen3-8B-Base