DeepSWE-Preview-FP8
This is an FP8 quantized version of the agentica-org/DeepSWE-Preview model. All credit for the original model goes to the Agentica team.
Description
DeepSWE-Preview-FP8 is an FP8 quantized version of the DeepSWE-Preview model, an open-source coding agent trained exclusively with reinforcement learning (RL) to excel at software engineering tasks. Built on top of Qwen3-32B with thinking mode enabled, the model demonstrates strong reasoning capabilities in navigating complex codebases and handling multiple files. This quantized version maintains the capabilities of the original while offering reduced memory requirements and faster inference.
The original model achieves 59.0% on SWE-Bench-Verified, making it #1 in the open-weights category.
Architecture
- Base Model: Qwen/Qwen3-32B
- Quantization Format: FP8 (Float-Quantized)
- Original Training Method: Pure Reinforcement Learning (no Supervised Fine-Tuning)
- Parameters: 32.8 Billion (original), quantized to FP8 precision
- Tensor Type: FP8 (quantized from F32)
- Context Length: Supports up to 65,536 tokens
Key Technical Details
- Trained with only 200 steps of RL, showing significant performance gains
- Pass@1 performance of 42.2% (averaged over 16 runs) in original model
- Enhanced GRPO algorithm incorporating innovations from DAPO, Dr. GRPO, LOOP/RLOO
- Uses R2E-Gym environment for training and evaluation
Quantization Benefits
- Reduced Memory Footprint: Lower VRAM requirements compared to the original model
- Faster Inference: Improved inference speed due to FP8 precision
- Maintained Performance: Preserves most of the original model's capabilities
Intended Use Cases
- Software Engineering Tasks: Designed primarily for coding-related activities
- Codebase Navigation: Excels at understanding and modifying complex codebases
- Multi-file Editing: Capable of viewing and editing multiple files in a project
- Automated Testing: Can execute bash commands and run test suites
- Research Foundation: Serves as a base model for developing future coding agents
Training Data
- 4.5K problems from a subset of R2E-Gym
- Filtered to avoid data contamination (e.g., removed problems from sympy repository)
- Each problem maps to individual Docker images
Deployment Recommendations
- Temperature: 1
- Max Tokens: 32-64K
- Serving Options: vLLM (recommended), Hugging Face TGI, SGLang, TensorRT-LLM
- Special Tools: Works with R2EGym's system prompt and tools (file_editor.py, execution_bash.py, search.py, finish.py)
Performance
While the FP8 quantization reduces memory requirements, the model should maintain comparable performance to the original on most software engineering tasks.
The model is released under the MIT License, emphasizing open and accessible AI development.
Relationship to Original Model
This model is a quantized version of agentica-org/DeepSWE-Preview. It maintains the core capabilities of the original while offering reduced memory requirements and faster inference through FP8 quantization.
- Downloads last month
- 4
