--- language: en tags: - 2048-game - reinforcement-learning - qwen - game-playing - rl - grpo license: apache-2.0 datasets: - custom --- # agent-2048-game-qwen-7b-8k-ds This model is a specialized game-playing AI trained to master the 2048 puzzle game using advanced reinforcement learning techniques. Based on the Qwen-7B architecture, it demonstrates sophisticated strategic planning and spatial reasoning capabilities. ## Model Description - **Base Model:** Qwen-7B-Instruct - **Training Approach:** Group Relative Policy Optimization (GRPO) - **Training Dataset:** 8,000 carefully curated game states - **Hardware Used:** Single RTX 4090 (24GB) - **Training Time:** ~10 hours - **Framework:** Implemented using trl library and accelerated by Unsloth ### Training Configuration - **Learning Rate:** 4e-5 (optimized after extensive testing) - **LoRA Rank:** 16 - **Max Sequence Length:** 1000 tokens - **Batch Size:** 1 (with gradient accumulation steps of 4) - **Optimizer:** paged_adamw_8bit ## Intended Use This model is designed to play the 2048 game by: 1. Analyzing the current board state 2. Planning strategic moves 3. Maximizing score and achieving high-value tiles 4. Maintaining efficient board organization ## Training Data The training data was generated through a sophisticated pipeline: - Simulated gameplay for realistic board states - Custom difficulty scoring system - 5-level difficulty classification - Balanced sampling across difficulty levels - Parallel processing for efficient generation ## Training Approach ### Reward System The model was trained using multiple reward components: 1. **Density Reward:** Encourages efficient tile merging and space utilization 2. **Highest Tile Reward:** Incentivizes creation of high-value tiles 3. **Survival Reward:** Promotes moves that maintain game continuity 4. **Format Compliance:** Ensures proper response structure ### Optimization - Utilized Unsloth for 2x faster fine-tuning - 4-bit quantization for efficient training - Implemented efficient LoRA adaptation ## Performance and Limitations ### Strengths - Strong strategic planning capabilities - Efficient tile merging and space management - Consistent high-score achievement - Structured decision-making process ### Limitations - Performance may vary with random seeds - Success not guaranteed due to game's inherent randomness - Model requires specific input formatting ## Example Usage ```python # Format your 4x4 game board as a string board_state = """ 2 | 4 | 8 | 16 . | . | 2 | 4 . | . | . | 2 . | . | . | . """ # Model will output one of: up, down, left, right ``` ## Citation ```bibtex @misc{dalal2024agent2048blog, author = {Dalal, Hrishbh}, title = {Agent 2048: Forging Strategic Gameplay in an AI Through Data, Rewards, and RL}, year = {2024}, month = {March}, url = {https://yourwebsite.com/blog/ai-agent-plays-2048}, note = {[Blog post] Accessed: March 30, 2024} } ``` ## Author Hrishbh Dalal ## Acknowledgments Special thanks to the research community on Twitter/X for valuable feedback on data generation strategies and training approaches. ## License This model is released under the Apache 2.0 license.