Update README.md
Browse files
README.md
CHANGED
|
@@ -1,8 +1,6 @@
|
|
| 1 |
-
|
| 2 |
---
|
| 3 |
-
|
| 4 |
license: llama3
|
| 5 |
-
|
| 6 |
---
|
| 7 |
|
| 8 |

|
|
@@ -232,4 +230,4 @@ We conduct supervised fine-tuning (SFT) on our base long-context model. In our p
|
|
| 232 |
| Scheduling | 5% warmup, cosine decay till 10% peak learning rate |
|
| 233 |
| Total #tokens | 1B |
|
| 234 |
|
| 235 |
-
- Synthetic data: we also experiment with several strategies to generate long, synthetic chat data, but they have not yet helped to improve upon our UltraChat-fine-tuned chat models. The synthetic data strategies we tried include (1) using a paragraph of a long book/repo to generate question-answer pairs; (2) using hierarchical methods to summarize a long book; (3) turning the previous synthetic long QA data into a RAG format.
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
license: llama3
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
---
|
| 5 |
|
| 6 |

|
|
|
|
| 230 |
| Scheduling | 5% warmup, cosine decay till 10% peak learning rate |
|
| 231 |
| Total #tokens | 1B |
|
| 232 |
|
| 233 |
+
- Synthetic data: we also experiment with several strategies to generate long, synthetic chat data, but they have not yet helped to improve upon our UltraChat-fine-tuned chat models. The synthetic data strategies we tried include (1) using a paragraph of a long book/repo to generate question-answer pairs; (2) using hierarchical methods to summarize a long book; (3) turning the previous synthetic long QA data into a RAG format.
|