Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ pinned: false
|
|
| 12 |
We are a group of researchers working together to collect and curate openly licensed and public domain data for training large language models.
|
| 13 |
So far, we have released:
|
| 14 |
|
| 15 |
-
- [The Common Pile v0.1](https://huggingface.co/collections/common-pile/common-pile-v01-
|
| 16 |
- Our paper: [The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text](https://huggingface.co/papers/2506.05209)
|
| 17 |
- [Comma v0.1-1T](https://huggingface.co/common-pile/comma-v0.1-1t) and [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t), 7B parameter LLMs trained on text from the Common Pile v0.1
|
| 18 |
- The [training dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset) used to train the Comma v0.1 models
|
|
|
|
| 12 |
We are a group of researchers working together to collect and curate openly licensed and public domain data for training large language models.
|
| 13 |
So far, we have released:
|
| 14 |
|
| 15 |
+
- [The Common Pile v0.1](https://huggingface.co/collections/common-pile/common-pile-v01-68307d37df48e36f02717f21), an 8 TB dataset of text from over 30 diverse sources
|
| 16 |
- Our paper: [The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text](https://huggingface.co/papers/2506.05209)
|
| 17 |
- [Comma v0.1-1T](https://huggingface.co/common-pile/comma-v0.1-1t) and [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t), 7B parameter LLMs trained on text from the Common Pile v0.1
|
| 18 |
- The [training dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset) used to train the Comma v0.1 models
|