xssstory nielsr HF Staff commited on
Commit
d6e1e96
·
verified ·
1 Parent(s): ba54c34

Add metadata, link to Github (#2)

Browse files

- Add metadata, link to Github (b854fcab0af7c33f06e16f0625e9a4be7e375290)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
- {}
 
 
3
  ---
 
4
  <h1 align="center">
5
  <em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
6
  </h1>
@@ -9,6 +12,7 @@
9
  | <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
10
  </p>
11
 
 
12
 
13
  AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
14
 
@@ -37,7 +41,7 @@ In our AReaL-boba² (A-ReaL-double-boba) release, we highlight the top 3 most im
37
 
38
  + Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
39
 
40
- For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](about:blank) for a more comprehensive presentation of our system design.
41
 
42
  ### Overview of Asynchronous RL Training
43
 
@@ -92,7 +96,7 @@ We highlight the [tutorials](https://inclusionai.github.io/AReaL/customization/d
92
  + [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
93
  + [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
94
  + [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
95
- + [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html)
96
 
97
  ### RL Training for Multi-turn Agent
98
 
@@ -100,12 +104,8 @@ AReaL-boba² allows you to independently customize the [dataset](https://inclusi
100
 
101
  In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
102
 
103
- **Multi-turn Agent Learning Curve**
104
-
105
  ## Getting Started
106
 
107
- ### Quick Start
108
-
109
  Train Qwen3 1.7B locally:
110
 
111
  ```bash
@@ -214,4 +214,4 @@ We also appreciate all the pioneering works from the community, particularly the
214
  primaryClass={cs.LG},
215
  url={https://arxiv.org/abs/2505.24298},
216
  }
217
- ```
 
1
  ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
  ---
6
+
7
  <h1 align="center">
8
  <em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
9
  </h1>
 
12
  | <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
13
  </p>
14
 
15
+ Code: https://github.com/inclusionAI/AReaL
16
 
17
  AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
18
 
 
41
 
42
  + Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
43
 
44
+ For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](https://arxiv.org/pdf/2505.24298).
45
 
46
  ### Overview of Asynchronous RL Training
47
 
 
96
  + [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
97
  + [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
98
  + [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
99
+ + [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html#the-decoupled-ppo-loss)
100
 
101
  ### RL Training for Multi-turn Agent
102
 
 
104
 
105
  In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
106
 
 
 
107
  ## Getting Started
108
 
 
 
109
  Train Qwen3 1.7B locally:
110
 
111
  ```bash
 
214
  primaryClass={cs.LG},
215
  url={https://arxiv.org/abs/2505.24298},
216
  }
217
+ ```