Add metadata, link to Github (#2)
Browse files- Add metadata, link to Github (b854fcab0af7c33f06e16f0625e9a4be7e375290)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,6 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
<h1 align="center">
|
| 5 |
<em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
|
| 6 |
</h1>
|
|
@@ -9,6 +12,7 @@
|
|
| 9 |
| <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
|
| 10 |
</p>
|
| 11 |
|
|
|
|
| 12 |
|
| 13 |
AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
|
| 14 |
|
|
@@ -37,7 +41,7 @@ In our AReaL-boba² (A-ReaL-double-boba) release, we highlight the top 3 most im
|
|
| 37 |
|
| 38 |
+ Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
|
| 39 |
|
| 40 |
-
For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](
|
| 41 |
|
| 42 |
### Overview of Asynchronous RL Training
|
| 43 |
|
|
@@ -92,7 +96,7 @@ We highlight the [tutorials](https://inclusionai.github.io/AReaL/customization/d
|
|
| 92 |
+ [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
|
| 93 |
+ [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
|
| 94 |
+ [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
|
| 95 |
-
+ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html)
|
| 96 |
|
| 97 |
### RL Training for Multi-turn Agent
|
| 98 |
|
|
@@ -100,12 +104,8 @@ AReaL-boba² allows you to independently customize the [dataset](https://inclusi
|
|
| 100 |
|
| 101 |
In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
|
| 102 |
|
| 103 |
-
**Multi-turn Agent Learning Curve**
|
| 104 |
-
|
| 105 |
## Getting Started
|
| 106 |
|
| 107 |
-
### Quick Start
|
| 108 |
-
|
| 109 |
Train Qwen3 1.7B locally:
|
| 110 |
|
| 111 |
```bash
|
|
@@ -214,4 +214,4 @@ We also appreciate all the pioneering works from the community, particularly the
|
|
| 214 |
primaryClass={cs.LG},
|
| 215 |
url={https://arxiv.org/abs/2505.24298},
|
| 216 |
}
|
| 217 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
---
|
| 6 |
+
|
| 7 |
<h1 align="center">
|
| 8 |
<em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
|
| 9 |
</h1>
|
|
|
|
| 12 |
| <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
|
| 13 |
</p>
|
| 14 |
|
| 15 |
+
Code: https://github.com/inclusionAI/AReaL
|
| 16 |
|
| 17 |
AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
|
| 18 |
|
|
|
|
| 41 |
|
| 42 |
+ Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
|
| 43 |
|
| 44 |
+
For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](https://arxiv.org/pdf/2505.24298).
|
| 45 |
|
| 46 |
### Overview of Asynchronous RL Training
|
| 47 |
|
|
|
|
| 96 |
+ [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
|
| 97 |
+ [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
|
| 98 |
+ [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
|
| 99 |
+
+ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html#the-decoupled-ppo-loss)
|
| 100 |
|
| 101 |
### RL Training for Multi-turn Agent
|
| 102 |
|
|
|
|
| 104 |
|
| 105 |
In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
|
| 106 |
|
|
|
|
|
|
|
| 107 |
## Getting Started
|
| 108 |
|
|
|
|
|
|
|
| 109 |
Train Qwen3 1.7B locally:
|
| 110 |
|
| 111 |
```bash
|
|
|
|
| 214 |
primaryClass={cs.LG},
|
| 215 |
url={https://arxiv.org/abs/2505.24298},
|
| 216 |
}
|
| 217 |
+
```
|