|
|
--- |
|
|
base_model: |
|
|
- willcb/Qwen3-14B |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- Danau5tin/terminal-tasks |
|
|
tags: |
|
|
- agent |
|
|
- code |
|
|
- multi-agent |
|
|
--- |
|
|
|
|
|
# Orca-Agent-v0.1 |
|
|
|
|
|
 |
|
|
|
|
|
In depth details behind the training, including the training code, is **all open sourced [here](https://github.com/Danau5tin/Orca-Agent-RL)** |
|
|
|
|
|
## Description |
|
|
Orca-Agent-v0.1 is an orchestration agent that acts as the brain of the operation - it receives the user's task but never touches code directly. Instead, it: |
|
|
|
|
|
- Analyses the task and breaks it into focused subtasks |
|
|
- Dispatches explorer agents to understand the system |
|
|
- Delegates implementation work to coder agents with precise instructions |
|
|
- Verifies all changes through additional explorer agents |
|
|
- Maintains the context store with all discovered knowledge |
|
|
|
|
|
## Performance |
|
|
| Orchestrator | Subagent | Terminal Bench | |
|
|
|--------------|----------|--------| |
|
|
| Qwen3-Coder-480B | Qwen3-Coder-480B | 19.7% | |
|
|
| Orca-Agent-v0.1-14B | Qwen3-Coder-30B | 18.25% | |
|
|
| Qwen3-14B | Qwen3-Coder-30B | 7.0% | |
|
|
|
|
|
This places Orca-Agent-v0.1 (14B) + Qwen3-Coder-Flash (30B MoE), within striking distance of [Qwen3-Coder-480B](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct) using the same architecture which placed #26 on TerminalBench when it was published recently in [my other project](https://github.com/Danau5tin/multi-agent-coding-system). |
|
|
|
|
|
## Example model output |
|
|
```xml |
|
|
<think> |
|
|
Okay, now with task_003 identifying a backup, the next step would be to attempt recovery using this identified file. |
|
|
</think> |
|
|
|
|
|
<task_create> |
|
|
agent_type: 'coder' |
|
|
title: 'Attempt recovery using the identified backup file' |
|
|
description: | |
|
|
Execute the following steps to recover using the backup identified in task_003: |
|
|
1. Move the identified backup file (/tmp/terraform_work/.terraform.tfstate.tmp) to the target location (/infrastructure/recovered_state.json). |
|
|
2. Verify the operation by checking: |
|
|
- The existence of the new file at the target location |
|
|
- The size matches the original backup file |
|
|
- The file permissions are set appropriately (rw-r--r--) |
|
|
max_turns: 10 |
|
|
context_bootstrap: |
|
|
- path: '/tmp/terraform_work/.terraform.tfstate.tmp' |
|
|
reason: 'The backup file identified in task_003' |
|
|
context_refs: |
|
|
- 'task_003' |
|
|
</task_create> |
|
|
``` |
|
|
|
|
|
## Model training Overview |
|
|
- Full fine-tune of Qwen3-14b |
|
|
- 32x H100s |
|
|
- 16x for training |
|
|
- 8x inference for Orca-Agent |
|
|
- 8x inference for subagent (Qwen3-Coder-30B-A3B) |
|
|
- Trained with GRPO + curriculum learning |
|
|
- Batch size 256, 64 rollouts per task |
|
|
- More details [here](https://github.com/Danau5tin/Orca-Agent-RL) |
|
|
|
|
|
## Serving model |
|
|
|
|
|
**vLLM** |
|
|
```bash |
|
|
vllm serve Danau5tin/Orca-Agent-v0.1 |
|
|
``` |
|
|
|
|
|
**SGLang** |
|
|
```bash |
|
|
python -m sglang.launch_server \ |
|
|
--model-path Danau5tin/Orca-Agent-v0.1 |
|
|
``` |
|
|
|
|
|
The agent's orchestration code can be found [here](https://github.com/Danau5tin/multi-agent-coding-system). |