---
language: en
tags:
- 2048-game
- reinforcement-learning
- qwen
- game-playing
- rl
- grpo
license: apache-2.0
datasets:
- custom
---

# agent-2048-game-qwen-7b-8k-ds

This model is a specialized game-playing AI trained to master the 2048 puzzle game using advanced reinforcement learning techniques. Based on the Qwen-7B architecture, it demonstrates sophisticated strategic planning and spatial reasoning capabilities.

## Model Description

- **Base Model:** Qwen-7B-Instruct
- **Training Approach:** Group Relative Policy Optimization (GRPO)
- **Training Dataset:** 8,000 carefully curated game states
- **Hardware Used:** Single RTX 4090 (24GB)
- **Training Time:** ~10 hours
- **Framework:** Implemented using trl library and accelerated by Unsloth

### Training Configuration
- **Learning Rate:** 4e-5 (optimized after extensive testing)
- **LoRA Rank:** 16
- **Max Sequence Length:** 1000 tokens
- **Batch Size:** 1 (with gradient accumulation steps of 4)
- **Optimizer:** paged_adamw_8bit

## Intended Use

This model is designed to play the 2048 game by:
1. Analyzing the current board state
2. Planning strategic moves
3. Maximizing score and achieving high-value tiles
4. Maintaining efficient board organization

## Training Data

The training data was generated through a sophisticated pipeline:
- Simulated gameplay for realistic board states
- Custom difficulty scoring system
- 5-level difficulty classification
- Balanced sampling across difficulty levels
- Parallel processing for efficient generation

## Training Approach

### Reward System
The model was trained using multiple reward components:
1. **Density Reward:** Encourages efficient tile merging and space utilization
2. **Highest Tile Reward:** Incentivizes creation of high-value tiles
3. **Survival Reward:** Promotes moves that maintain game continuity
4. **Format Compliance:** Ensures proper response structure

### Optimization
- Utilized Unsloth for 2x faster fine-tuning
- 4-bit quantization for efficient training
- Implemented efficient LoRA adaptation

## Performance and Limitations

### Strengths
- Strong strategic planning capabilities
- Efficient tile merging and space management
- Consistent high-score achievement
- Structured decision-making process

### Limitations
- Performance may vary with random seeds
- Success not guaranteed due to game's inherent randomness
- Model requires specific input formatting

## Example Usage

```python
# Format your 4x4 game board as a string
board_state = """
2 | 4 | 8 | 16
. | . | 2 | 4
. | . | . | 2
. | . | . | .
"""

# Model will output one of: up, down, left, right
```

## Citation

```bibtex
@misc{dalal2024agent2048blog,
    author = {Dalal, Hrishbh},
    title = {Agent 2048: Forging Strategic Gameplay in an AI Through Data, Rewards, and RL},
    year = {2024},
    month = {March},
    url = {https://yourwebsite.com/blog/ai-agent-plays-2048},
    note = {[Blog post] Accessed: March 30, 2024}
}
```

## Author

Hrishbh Dalal

## Acknowledgments

Special thanks to the research community on Twitter/X for valuable feedback on data generation strategies and training approaches.

## License

This model is released under the Apache 2.0 license.