File size: 5,347 Bytes

---
language:
- zh
- en
tags:
- translation
license: cc-by-4.0
datasets:
- quickmt/quickmt-train.zh-en
model-index:
- name: quickmt-zh-en
  results:
  - task:
      name: Translation zho-eng
      type: translation
      args: zho-eng
    dataset:
      name: flores101-devtest
      type: flores_101
      args: zho_Hans eng_Latn devtest
    metrics:
    - name: BLEU
      type: bleu
      value: 28.58
    - name: CHRF
      type: chrf
      value: 57.46
---


# `quickmt-zh-en` Neural Machine Translation Model 

# Usage

## Install `quickmt`

```bash
git clone https://github.com/quickmt/quickmt.git
pip install ./quickmt/
```

## Download model

```bash
quickmt-model-download quickmt/quickmt-zh-en ./quickmt-zh-en
```

## Use model

Inference with `quickmt`:

```python
from quickmt import Translator

# Auto-detects GPU, set to "cpu" to force CPU inference
t = Translator("./quickmt-zh-en/", device="auto")

# Translate - set beam size to 5 for higher quality (but slower speed)
t(["他补充道：“我们现在有 4 个月大没有糖尿病的老鼠，但它们曾经得过该病。”"], beam_size=1)

# Get alternative translations by sampling
# You can pass any cTranslate2 `translate_batch` arguments
t(["他补充道：“我们现在有 4 个月大没有糖尿病的老鼠，但它们曾经得过该病。”"], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
```

The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use the model files directly if you want. It would be fairly easy to get them to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.

# Model Information

* Trained using [`eole`](https://github.com/eole-nlp/eole)
    - It took about 1 day on a single RTX 4090 on [vast.ai](https://cloud.vast.ai)
* Exported for fast inference to []CTranslate2](https://github.com/OpenNMT/CTranslate2) format
* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.zh-en/tree/main

## Metrics

BLEU and CHRF2 calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the Flores200 `devtest` test set ("zho_Hans"->"eng_Latn").

"Time" is the time to translate the following input with a single CPU core:

> 2019冠状病毒病（英語：Coronavirus disease 2019，缩写：COVID-19[17][18]），是一種由嚴重急性呼吸系統綜合症冠狀病毒2型（縮寫：SARS-CoV-2）引發的傳染病，导致了一场持续的疫情，成为人類歷史上致死人數最多的流行病之一。

| Model                            | bleu  | chrf2 | Time (s) | 
| -------------------------------- | ----- | ----- | ----     |
| quickmt/quickmt-zh-en            | 28.58 | 57.46 |  0.670   |
| Helsinki-NLP/opus-mt-zh-en       | 23.35 | 53.60 |  0.838   |
| facebook/m2m100_418M             | 18.96 | 50.06 | 11.5     |
| facebook/nllb-200-distilled-600M | 26.22 | 55.17 | 13.2     |
| facebook/nllb-200-distilled-1.3B | 28.54 | 57.34 | 23.6     |
| facebook/m2m100_1.2B             | 24.68 | 54.68 | 25.7     |
| google/madlad400-3b-mt           | 28.74 | 58.01 | ???      |

`quickmt-zh-en` is the fastest and delivers fairly high quality. 

Helsinki-NLP/opus-mt-zh-en is one of the most downloaded machine translation models on HuggingFace, and this model is considerably more accurate *and* a bit faster.


## Training Configuration

```yaml
### Vocab
src_vocab_size: 20000
tgt_vocab_size: 20000
share_vocab: False

data:
    corpus_1:
        path_src: hf://quickmt/quickmt-train-zh-en/zh
        path_tgt: hf://quickmt/quickmt-train-zh-en/en
        path_sco: hf://quickmt/quickmt-train-zh-en/sco
    valid:
        path_src: zh-en/dev.zho
        path_tgt: zh-en/dev.eng

transforms: [sentencepiece, filtertoolong]
transforms_configs:
  sentencepiece:
    src_subword_model: "zh-en/src.spm.model"
    tgt_subword_model: "zh-en/tgt.spm.model"
  filtertoolong:
    src_seq_length: 512
    tgt_seq_length: 512

training:
    # Run configuration
    model_path: quickmt-zh-en
    keep_checkpoint: 4
    save_checkpoint_steps: 1000
    train_steps: 104000
    valid_steps: 1000
    
    # Train on a single GPU
    world_size: 1
    gpu_ranks: [0]

    # Batching
    batch_type: "tokens"
    batch_size: 13312
    valid_batch_size: 13312
    batch_size_multiple: 8
    accum_count: [4]
    accum_steps: [0]

    # Optimizer & Compute
    compute_dtype: "bfloat16"
    optim: "pagedadamw8bit"
    learning_rate: 1.0
    warmup_steps: 10000
    decay_method: "noam"
    adam_beta2: 0.998

    # Data loading
    bucket_size: 262144
    num_workers: 4
    prefetch_factor: 100

    # Hyperparams
    dropout_steps: [0]
    dropout: [0.1]
    attention_dropout: [0.1]
    max_grad_norm: 0
    label_smoothing: 0.1
    average_decay: 0.0001
    param_init_method: xavier_uniform
    normalization: "tokens"

model:
    architecture: "transformer"
    layer_norm: standard
    share_embeddings: false
    share_decoder_embeddings: true
    add_ffnbias: true
    mlp_activation_fn: gated-silu
    add_estimator: false
    add_qkvbias: false
    norm_eps: 1e-6
    hidden_size: 1024
    encoder:
        layers: 8
    decoder:
        layers: 2
    heads: 16
    transformer_ff: 4096
    embeddings:
        word_vec_size: 1024
        position_encoding_type: "SinusoidalInterleaved"
```