Commit
·
035c3dd
1
Parent(s):
8d6c21c
update readme
Browse files
README.md
CHANGED
|
@@ -1,3 +1,66 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- ja
|
| 4 |
+
- en
|
| 5 |
+
license: mit
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# Sarashina2.2-3B
|
| 9 |
+
|
| 10 |
+
This repository provides large language models trained by [SB Intuitions](https://www.sbintuitions.co.jp/).
|
| 11 |
+
|
| 12 |
+
## How to use
|
| 13 |
+
|
| 14 |
+
```python
|
| 15 |
+
import torch
|
| 16 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed
|
| 17 |
+
|
| 18 |
+
model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.2-3b", torch_dtype=torch.bfloat16, device_map="auto")
|
| 19 |
+
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.2-3b")
|
| 20 |
+
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
| 21 |
+
set_seed(123)
|
| 22 |
+
|
| 23 |
+
text = generator(
|
| 24 |
+
"おはようございます、今日の天気は",
|
| 25 |
+
max_length=30,
|
| 26 |
+
do_sample=True,
|
| 27 |
+
pad_token_id=tokenizer.pad_token_id,
|
| 28 |
+
num_return_sequences=3,
|
| 29 |
+
)
|
| 30 |
+
|
| 31 |
+
for t in text:
|
| 32 |
+
print(t)
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
## Model Description
|
| 38 |
+
|
| 39 |
+
We constructed the Sarashina2.2-3B model, which consists of about 3 billion parameters (excluding embeddings and the LM head when calculating the number of parameters), using a three-phase training process.
|
| 40 |
+
First, we trained the model on 10 trillion tokens, including Japanese, English, and code data extracted from web corpora.
|
| 41 |
+
Next, we trained the model using synthetic data to improve its performance on math and coding tasks.
|
| 42 |
+
Finally, we trained the model with a small amount of data to enhance its performance on various application tasks.
|
| 43 |
+
|
| 44 |
+
The following tables show the model's performance on Japanese tasks.
|
| 45 |
+
For reference, we also present the performance of our previous LLMs.
|
| 46 |
+
As shown in the table, our Sarashina2.2-3B outperforms Sarashina2-7B in Japanese QA tasks such as NIILC and JMMLU.
|
| 47 |
+
In addition, Sarashina2.2-3B outperforms Sarashina2-70B in Japanese math and coding tasks such as MGSM-ja and JHumanEval.
|
| 48 |
+
|
| 49 |
+
#### Evaluation in Japanese tasks
|
| 50 |
+
|
| 51 |
+
| Model | NIILC | JMMLU | MGSM-ja | JHumanEval |
|
| 52 |
+
|------------------|------------|------------|-----------|------------|
|
| 53 |
+
| [Sarashina2-7B](https://huggingface.co/sbintuitions/sarashina2-7b) | 62.2 | 42.5 | 7.2 | 12.8 |
|
| 54 |
+
| [Sarashina2-70B](https://huggingface.co/sbintuitions/sarashina2-70b) | **66.1** | **62.7** | 56.4 | 22.0 |
|
| 55 |
+
|**[Sarashina2.2-0.5B](https://huggingface.co/sbintuitions/sarashina2.2-0.5b)**| 34.6 | 28.8 | 21.2 | 15.2 |
|
| 56 |
+
|**[Sarashina2.2-1B](https://huggingface.co/sbintuitions/sarashina2.2-1b)**| 47.2 | 38.4 | 38.8 | 21.3 |
|
| 57 |
+
|**[Sarashina2.2-3B](https://huggingface.co/sbintuitions/sarashina2.2-3b)**| 62.2 | 52.7 | **63.6** | **39.6** |
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
## Ethical Considerations and Limitations
|
| 61 |
+
This repository contains the pre-trained model, which has not yet been tuned to follow instructions.
|
| 62 |
+
Therefore, this model may generate meaningless sequences, inaccurate instances, or biased/objectionable outputs.
|
| 63 |
+
As post-trained Sarashina2.2 models, we have published [Sarashina2.2-0.5B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-0.5b-instruct-v0.1), [Sarashina2.2-1B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-1b-instruct-v0.1), and [Sarashina2.2-3B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-3b-instruct-v0.1).
|
| 64 |
+
|
| 65 |
+
## License
|
| 66 |
+
[MIT License](https://huggingface.co/sbintuitions/sarashina2.2-3b/blob/main/LICENSE)
|