Translation
English
Chinese
Eval Results
File size: 5,347 Bytes
4a094ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180f838
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4a094ac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
language:
- zh
- en
tags:
- translation
license: cc-by-4.0
datasets:
- quickmt/quickmt-train.zh-en
model-index:
- name: quickmt-zh-en
  results:
  - task:
      name: Translation zho-eng
      type: translation
      args: zho-eng
    dataset:
      name: flores101-devtest
      type: flores_101
      args: zho_Hans eng_Latn devtest
    metrics:
    - name: BLEU
      type: bleu
      value: 28.58
    - name: CHRF
      type: chrf
      value: 57.46
---


# `quickmt-zh-en` Neural Machine Translation Model 

# Usage

## Install `quickmt`

```bash
git clone https://github.com/quickmt/quickmt.git
pip install ./quickmt/
```

## Download model

```bash
quickmt-model-download quickmt/quickmt-zh-en ./quickmt-zh-en
```

## Use model

Inference with `quickmt`:

```python
from quickmt import Translator

# Auto-detects GPU, set to "cpu" to force CPU inference
t = Translator("./quickmt-zh-en/", device="auto")

# Translate - set beam size to 5 for higher quality (but slower speed)
t(["他补充道:“我们现在有 4 个月大没有糖尿病的老鼠,但它们曾经得过该病。”"], beam_size=1)

# Get alternative translations by sampling
# You can pass any cTranslate2 `translate_batch` arguments
t(["他补充道:“我们现在有 4 个月大没有糖尿病的老鼠,但它们曾经得过该病。”"], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
```

The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use the model files directly if you want. It would be fairly easy to get them to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.

# Model Information

* Trained using [`eole`](https://github.com/eole-nlp/eole)
    - It took about 1 day on a single RTX 4090 on [vast.ai](https://cloud.vast.ai)
* Exported for fast inference to []CTranslate2](https://github.com/OpenNMT/CTranslate2) format
* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.zh-en/tree/main

## Metrics

BLEU and CHRF2 calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the Flores200 `devtest` test set ("zho_Hans"->"eng_Latn").

"Time" is the time to translate the following input with a single CPU core:

> 2019冠状病毒病(英語:Coronavirus disease 2019,缩写:COVID-19[17][18]),是一種由嚴重急性呼吸系統綜合症冠狀病毒2型(縮寫:SARS-CoV-2)引發的傳染病,导致了一场持续的疫情,成为人類歷史上致死人數最多的流行病之一。

| Model                            | bleu  | chrf2 | Time (s) | 
| -------------------------------- | ----- | ----- | ----     |
| quickmt/quickmt-zh-en            | 28.58 | 57.46 |  0.670   |
| Helsinki-NLP/opus-mt-zh-en       | 23.35 | 53.60 |  0.838   |
| facebook/m2m100_418M             | 18.96 | 50.06 | 11.5     |
| facebook/nllb-200-distilled-600M | 26.22 | 55.17 | 13.2     |
| facebook/nllb-200-distilled-1.3B | 28.54 | 57.34 | 23.6     |
| facebook/m2m100_1.2B             | 24.68 | 54.68 | 25.7     |
| google/madlad400-3b-mt           | 28.74 | 58.01 | ???      |

`quickmt-zh-en` is the fastest and delivers fairly high quality. 

Helsinki-NLP/opus-mt-zh-en is one of the most downloaded machine translation models on HuggingFace, and this model is considerably more accurate *and* a bit faster.


## Training Configuration

```yaml
### Vocab
src_vocab_size: 20000
tgt_vocab_size: 20000
share_vocab: False

data:
    corpus_1:
        path_src: hf://quickmt/quickmt-train-zh-en/zh
        path_tgt: hf://quickmt/quickmt-train-zh-en/en
        path_sco: hf://quickmt/quickmt-train-zh-en/sco
    valid:
        path_src: zh-en/dev.zho
        path_tgt: zh-en/dev.eng

transforms: [sentencepiece, filtertoolong]
transforms_configs:
  sentencepiece:
    src_subword_model: "zh-en/src.spm.model"
    tgt_subword_model: "zh-en/tgt.spm.model"
  filtertoolong:
    src_seq_length: 512
    tgt_seq_length: 512

training:
    # Run configuration
    model_path: quickmt-zh-en
    keep_checkpoint: 4
    save_checkpoint_steps: 1000
    train_steps: 104000
    valid_steps: 1000
    
    # Train on a single GPU
    world_size: 1
    gpu_ranks: [0]

    # Batching
    batch_type: "tokens"
    batch_size: 13312
    valid_batch_size: 13312
    batch_size_multiple: 8
    accum_count: [4]
    accum_steps: [0]

    # Optimizer & Compute
    compute_dtype: "bfloat16"
    optim: "pagedadamw8bit"
    learning_rate: 1.0
    warmup_steps: 10000
    decay_method: "noam"
    adam_beta2: 0.998

    # Data loading
    bucket_size: 262144
    num_workers: 4
    prefetch_factor: 100

    # Hyperparams
    dropout_steps: [0]
    dropout: [0.1]
    attention_dropout: [0.1]
    max_grad_norm: 0
    label_smoothing: 0.1
    average_decay: 0.0001
    param_init_method: xavier_uniform
    normalization: "tokens"

model:
    architecture: "transformer"
    layer_norm: standard
    share_embeddings: false
    share_decoder_embeddings: true
    add_ffnbias: true
    mlp_activation_fn: gated-silu
    add_estimator: false
    add_qkvbias: false
    norm_eps: 1e-6
    hidden_size: 1024
    encoder:
        layers: 8
    decoder:
        layers: 2
    heads: 16
    transformer_ff: 4096
    embeddings:
        word_vec_size: 1024
        position_encoding_type: "SinusoidalInterleaved"
```