metadata
base_model: Qwen/Qwen3-4B-Thinking-2507
tags:
- ellora
- lora
- code-execution
- execution-tracing
- world-model
- cwm
- grpo
- thinking
- code-understanding
- peft
- qwen
library_name: peft
license: apache-2.0
pipeline_tag: text-generation
inference: true
model_type: qwen3
datasets:
- codelion/execution-world-model-dataset
codelion/Qwen3-4B-execution-world-model-lora
π Execution-Aware World Model LoRA
This LoRA adapter adds execution awareness capabilities to Qwen/Qwen3-4B-Thinking-2507. Inspired by Meta's CWM (Code World Model) research, it enables the model to predict and understand program execution step-by-step.
π― Key Features
- Step-by-Step Execution Prediction: Predicts variable states at each line
- Dynamic World Model: Understands how code behaves at runtime
- Execution Tracing: Generates detailed execution traces with variable states
- Debugging Support: Can identify and explain execution behavior
- GRPO-Trained: Uses preference learning with real execution feedback
π Performance Metrics
- Base Model: Qwen/Qwen3-4B-Thinking-2507
- Training Method: GRPO (Group Relative Policy Optimization) with Real Execution Traces
- LoRA Rank: 64
- LoRA Alpha: 128
- Training Samples: 298
- Evaluation Samples: 323
- Execution Prediction Accuracy: 20.0%
- Mean State Accuracy: 33.3%
π§ Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-4B-Thinking-2507",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Thinking-2507")
# Load execution world model LoRA
model = PeftModel.from_pretrained(model, "codelion/Qwen3-4B-execution-world-model-lora")
# Analyze code execution
prompt = """Analyze this code and predict its execution trace:
\`\`\`python
x = 10
y = x * 2
z = x + y
\`\`\`
Show variable states at each line."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Example Output
<execution_trace>
Line 1: State: {x=10}
Line 2: State: {x=10, y=20}
Line 3: State: {x=10, y=20, z=30}
</execution_trace>
π§ͺ Training Details
- Method: GRPO (Group Relative Policy Optimization)
- Data: Self-generated code with real execution traces
- Epochs: 3
- Reward: Gradual scoring (0.0-1.0) based on execution accuracy
π Dataset
codelion/execution-world-model-dataset
- Python code (3-20 lines)
- Real execution traces via
sys.settrace() - Ground truth variable states
π·οΈ Related
- Dataset: codelion/execution-world-model-dataset
- Base Model: Qwen/Qwen3-4B-Thinking-2507
- Project: Ellora Recipes
Part of the Ellora project - standardized recipes for enhancing LLM capabilities.