sakana-yu commited on
Commit
8a3b38e
·
0 Parent(s):
.gitattributes ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
5
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.model filter=lfs diff=lfs merge=lfs -text
12
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
13
+ *.onnx filter=lfs diff=lfs merge=lfs -text
14
+ *.ot filter=lfs diff=lfs merge=lfs -text
15
+ *.parquet filter=lfs diff=lfs merge=lfs -text
16
+ *.pb filter=lfs diff=lfs merge=lfs -text
17
+ *.pt filter=lfs diff=lfs merge=lfs -text
18
+ *.pth filter=lfs diff=lfs merge=lfs -text
19
+ *.rar filter=lfs diff=lfs merge=lfs -text
20
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
21
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
22
+ *.tflite filter=lfs diff=lfs merge=lfs -text
23
+ *.tgz filter=lfs diff=lfs merge=lfs -text
24
+ *.xz filter=lfs diff=lfs merge=lfs -text
25
+ *.zip filter=lfs diff=lfs merge=lfs -text
26
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
27
+ *.tfevents* filter=lfs diff=lfs merge=lfs -text
28
+ *.db* filter=lfs diff=lfs merge=lfs -text
29
+ *.ark* filter=lfs diff=lfs merge=lfs -text
30
+ **/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
31
+ **/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
32
+ **/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
33
+ pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
34
+ resources/.ipynb_checkpoints/ocr_general-checkpoint.png filter=lfs diff=lfs merge=lfs -text
35
+ resources/.ipynb_checkpoints/OFA_logo_tp_path-checkpoint.svg filter=lfs diff=lfs merge=lfs -text
36
+ resources/.ipynb_checkpoints/image_ocr_recognition-checkpoint.jpg filter=lfs diff=lfs merge=lfs -text
37
+ resources/.ipynb_checkpoints/ocr_general_demo-checkpoint.png filter=lfs diff=lfs merge=lfs -text
38
+ resources/.ipynb_checkpoints/ocr_scene-checkpoint.png filter=lfs diff=lfs merge=lfs -text
39
+ resources/OFA_logo_tp_path.svg filter=lfs diff=lfs merge=lfs -text
40
+ resources/image_ocr_recognition.jpg filter=lfs diff=lfs merge=lfs -text
41
+ resources/ocr_general.png filter=lfs diff=lfs merge=lfs -text
42
+ resources/ocr_general_demo.png filter=lfs diff=lfs merge=lfs -text
43
+ resources/ocr_scene.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ backbone:
3
+ - OFA
4
+ domain:
5
+ - multi-modal
6
+ frameworks:
7
+ - pytorch
8
+ license: Apache License 2.0
9
+ metrics:
10
+ - accuracy
11
+ tags:
12
+ - Alibaba
13
+ - ICML2022
14
+ - arxiv:2202.03052
15
+ tasks:
16
+ - ocr-recognition
17
+
18
+ datasets:
19
+ evaluation:
20
+ - modelscope/ocr_fudanvi_zh
21
+ train:
22
+ - modelscope/ocr_fudanvi_zh
23
+ finetune-support: True
24
+ integrating: False
25
+ widgets:
26
+ - task: ofa-ocr-recognition
27
+ inputs:
28
+ - name: image
29
+ title: 图片
30
+ type: image
31
+ validator:
32
+ max_resolution: 5000*5000
33
+ max_size: 10M
34
+ examples:
35
+ - name: 1
36
+ title: 示例1
37
+ inputs:
38
+ - data: https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ocr/ocr_general_demo.png
39
+ name: image
40
+ inferencespec:
41
+ cpu: 4
42
+ gpu: 1
43
+ gpu_memory: 16000
44
+ memory: 43000
45
+ integrating: True
46
+ ---
47
+ # OFA-文字识别
48
+ ## News
49
+ - 2023年1月:
50
+ - 优化了finetune流程,支持参数更新、自定义数据及脚本分布式训练等,见finetune示例。
51
+ - 2022年11月:
52
+ - 发布ModelScope 1.0版本,以下能力请使用1.0.2及以上版本。
53
+ - 支持finetune能力,新增[OFA Tutorial](https://www.modelscope.cn/docs/OFA%20Tutorial),finetune能力参考1.4节。
54
+
55
+
56
+ ## 文字识别是什么?
57
+ 文字识别,即给定一张文本图片,识别出图中所含文字并输出对应字符串,欢迎使用!
58
+
59
+
60
+ ## 快速玩起来
61
+ 玩转OFA只需区区以下6行代码,就是如此轻松!如果你觉得还不够方便,请点击右上角`Notebook`按钮,我们为你提供了配备了GPU的环境,你只需要在notebook里输入提供的代码,就可以把OFA玩起来了!
62
+
63
+ <p align="center">
64
+ <img src="resources/ocr_general_demo.png" alt="ocr" width="200" />
65
+
66
+ ```python
67
+ from modelscope.pipelines import pipeline
68
+ from modelscope.utils.constant import Tasks
69
+ from modelscope.outputs import OutputKeys
70
+
71
+ # ModelScope Library >= 1.2.0
72
+ ocr_recognize = pipeline(Tasks.ocr_recognition, model='damo/ofa_ocr-recognition_general_base_zh', model_revision='v1.0.2')
73
+ result = ocr_recognize('https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ocr/ocr_general_demo.png')
74
+ print(result[OutputKeys.TEXT])
75
+ ```
76
+ <br>
77
+
78
+ ## OFA是什么?
79
+ OFA(One-For-All)是通用多模态预训练模型,使用简单的序列到序列的学习框架统一模态(跨模态、视觉、语言等模态)和任务(如图片生成、视觉定位、图片描述、图片分类、文本生成等),详见我们发表于ICML 2022的论文:[OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework](https://arxiv.org/abs/2202.03052),以及我们的官方Github仓库[https://github.com/OFA-Sys/OFA](https://github.com/OFA-Sys/OFA)。
80
+
81
+ <p align="center">
82
+ <br>
83
+ <img src="resources/OFA_logo_tp_path.svg" width="150" />
84
+ <br>
85
+ <p>
86
+ <br>
87
+
88
+ <p align="center">
89
+ <a href="https://github.com/OFA-Sys/OFA">Github</a>&nbsp | &nbsp<a href="https://arxiv.org/abs/2202.03052">Paper </a>&nbsp | &nbspBlog
90
+ </p>
91
+
92
+ <p align="center">
93
+ <br>
94
+ <video src="https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/resources/modelscope_web/demo.mp4" loop="loop" autoplay="autoplay" muted width="100%"></video>
95
+ <br>
96
+ </p>
97
+
98
+
99
+ ## 为什么OFA是文字识别的最佳选择?
100
+ OFA在文字识别(ocr recognize)在公开数据集(including RCTW, ReCTS, LSVT, ArT, CTW)中进行评测, 在准确率指标上达到SOTA结果,具体如下:
101
+ <p align="left">
102
+ <table border="1" width="100%">
103
+ <tr align="center">
104
+ <td>Model</td><td>Scene</td><td>Web</td><td>Document</td><td>Handwriting</td><td>Avg</td>
105
+ </tr>
106
+ <tr align="center">
107
+ <td>SAR</td><td>62.5</td><td>54.3</td><td>93.8</td><td>31.4</td><td>67.3</td>
108
+ </tr>
109
+ <tr align="center">
110
+ <td>TransOCR</td><td>63.3</td><td>62.3</td><td>96.9</td><td>53.4</td><td>72.8</td>
111
+ </tr>
112
+ <tr align="center">
113
+ <td>MaskOCR-base</td><td>73.9</td><td>74.8</td><td>99.3</td><td>63.7</td><td>80.8</td>
114
+ </tr>
115
+ <tr align="center">
116
+ <td>OFA-OCR</td><td>82.9</td><td>81.7</td><td>99.1</td><td>69.0</td><td>86.0</td>
117
+ </tr>
118
+ </table>
119
+ <br>
120
+ </p>
121
+
122
+ ## 模型训练流程
123
+
124
+ ### 训练数据介绍
125
+ 本模型训练数据集是复旦大学视觉智能实验室,数据链接:https://github.com/FudanVI/benchmarking-chinese-text-recognition
126
+ 场景数据集图片采样:
127
+ <p align="center">
128
+ <img src="./resources/ocr_general.png" width="500" />
129
+ </p>
130
+
131
+ ### 训练流程
132
+ 模型及finetune细节请参考[OFA Tutorial](https://modelscope.cn/docs/OFA_Tutorial#1.4%20%E5%A6%82%E4%BD%95%E8%AE%AD%E7%BB%83) 1.4节。
133
+
134
+ ### Finetune示例
135
+ ```python
136
+ import tempfile
137
+ from modelscope.msdatasets import MsDataset
138
+ from modelscope.metainfo import Trainers
139
+ from modelscope.trainers import build_trainer
140
+ from modelscope.utils.constant import DownloadMode
141
+
142
+ train_dataset = MsDataset(MsDataset.load(
143
+ 'ocr_fudanvi_zh',
144
+ subset_name='scene',
145
+ namespace='modelscope',
146
+ split='train[:100]',
147
+ download_mode=DownloadMode.REUSE_DATASET_IF_EXISTS).remap_columns({
148
+ 'label': 'text'
149
+ }))
150
+
151
+ test_dataset = MsDataset(
152
+ MsDataset.load(
153
+ 'ocr_fudanvi_zh',
154
+ subset_name='scene',
155
+ namespace='modelscope',
156
+ split='test[:20]',
157
+ download_mode=DownloadMode.REUSE_DATASET_IF_EXISTS).remap_columns({
158
+ 'label': 'text'
159
+ }))
160
+
161
+ # 可以在代码修改 configuration 的配置
162
+ def cfg_modify_fn(cfg):
163
+ cfg.train.hooks = [{
164
+ 'type': 'CheckpointHook',
165
+ 'interval': 2
166
+ }, {
167
+ 'type': 'TextLoggerHook',
168
+ 'interval': 1
169
+ }, {
170
+ 'type': 'IterTimerHook'
171
+ }]
172
+ cfg.train.max_epochs=2
173
+ return cfg
174
+
175
+ args = dict(
176
+ model='damo/ofa_ocr-recognition_general_base_zh',
177
+ model_revision='v1.0.2',
178
+ train_dataset=train_dataset,
179
+ eval_dataset=test_dataset,
180
+ cfg_modify_fn=cfg_modify_fn,
181
+ work_dir = tempfile.TemporaryDirectory().name)
182
+ trainer = build_trainer(name=Trainers.ofa, default_args=args)
183
+ trainer.train()
184
+ ```
185
+
186
+ ## 模型局限性以及可能的偏差
187
+ 训练数据集自身有局限,有可能产生一些偏差,请用户自行评测后决定如何使用。
188
+
189
+ ## 相关论文以及引用
190
+ 如果你觉得OFA好用,喜欢我们的工作,欢迎引用:
191
+ ```
192
+ @article{wang2022ofa,
193
+ author = {Peng Wang and
194
+ An Yang and
195
+ Rui Men and
196
+ Junyang Lin and
197
+ Shuai Bai and
198
+ Zhikang Li and
199
+ Jianxin Ma and
200
+ Chang Zhou and
201
+ Jingren Zhou and
202
+ Hongxia Yang},
203
+ title = {OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence
204
+ Learning Framework},
205
+ journal = {CoRR},
206
+ volume = {abs/2202.03052},
207
+ year = {2022}
208
+ }
209
+ ```
config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.0,
3
+ "activation_function": "gelu",
4
+ "add_type_embedding": true,
5
+ "architectures": [
6
+ "OFAModel"
7
+ ],
8
+ "attention_dropout": 0.0,
9
+ "attn_scale_factor": 2.0,
10
+ "bos_token_id": 0,
11
+ "classifier_dropout": 0.0,
12
+ "code_image_size": 128,
13
+ "code_layernorm_embedding": true,
14
+ "d_model": 768,
15
+ "decoder_attention_heads": 12,
16
+ "decoder_drop_path_rate": 0.0,
17
+ "decoder_ffn_dim": 3072,
18
+ "decoder_layerdrop": 0.0,
19
+ "decoder_layers": 6,
20
+ "decoder_normalize_before": true,
21
+ "decoder_start_token_id": 0,
22
+ "dropout": 0.1,
23
+ "encoder_attention_heads": 12,
24
+ "encoder_drop_path_rate": 0.0,
25
+ "encoder_ffn_dim": 3072,
26
+ "encoder_layerdrop": 0.0,
27
+ "encoder_layers": 6,
28
+ "encoder_normalize_before": true,
29
+ "entangle_position_embedding": false,
30
+ "eos_token_id": 2,
31
+ "forced_eos_token_id": 2,
32
+ "image_bucket_size": 42,
33
+ "init_std": 0.02,
34
+ "is_encoder_decoder": true,
35
+ "layernorm_embedding": true,
36
+ "max_position_embeddings": 1024,
37
+ "model_type": "ofa",
38
+ "normformer": true,
39
+ "num_hidden_layers": 6,
40
+ "pad_token_id": 1,
41
+ "patch_layernorm_embedding": true,
42
+ "resnet_drop_path_rate": 0.0,
43
+ "resnet_model_path": null,
44
+ "resnet_type": "resnet101",
45
+ "scale_embedding": false,
46
+ "share_decoder_input_output_embed": true,
47
+ "token_bucket_size": 256,
48
+ "torch_dtype": "float32",
49
+ "transformers_version": "4.22.2",
50
+ "use_cache": true,
51
+ "vocab_size": 30325,
52
+ "interpolate_position": true,
53
+ "orig_patch_image_size": 224
54
+ }
configuration.json ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "framework": "pytorch",
3
+
4
+ "task": "ocr-recognition",
5
+
6
+ "model": {
7
+ "type": "ofa",
8
+ "beam_search": {
9
+ "beam_size": 5,
10
+ "max_len_b": 64,
11
+ "min_len": 1,
12
+ "no_repeat_ngram_size": 0
13
+ },
14
+ "seed": 7,
15
+ "max_src_length": 128,
16
+ "language": "zh",
17
+ "prompt": "图片上的文字是什么?",
18
+ "gen_type": "generation",
19
+ "patch_image_size": 480,
20
+ "max_image_size": 480,
21
+ "is_document": false,
22
+ "imagenet_default_mean_and_std": false
23
+ },
24
+ "pipeline": {
25
+ "type": "ofa-ocr-recognition"
26
+ },
27
+ "dataset": {
28
+ "column_map": {
29
+ "text": "text",
30
+ "image": "image"
31
+ }
32
+ },
33
+ "train": {
34
+ "work_dir": "/tmp",
35
+ "max_epochs": 1,
36
+ "use_fp16": false,
37
+ "dataloader": {
38
+ "batch_size_per_gpu": 4,
39
+ "workers_per_gpu": 0
40
+ },
41
+ "lr_scheduler": {
42
+ "name": "polynomial_decay",
43
+ "warmup_proportion": 0.01,
44
+ "lr_end": 1e-07
45
+ },
46
+ "lr_scheduler_hook": {
47
+ "type": "LrSchedulerHook",
48
+ "by_epoch": false
49
+ },
50
+ "optimizer": {
51
+ "type": "AdamW",
52
+ "lr": 5e-05,
53
+ "weight_decay": 0.01
54
+ },
55
+ "optimizer_hook": {
56
+ "type": "TorchAMPOptimizerHook",
57
+ "cumulative_iters": 1,
58
+ "grad_clip": {
59
+ "max_norm": 1.0,
60
+ "norm_type": 2
61
+ },
62
+ "loss_keys": "loss"
63
+ },
64
+ "criterion": {
65
+ "name": "AdjustLabelSmoothedCrossEntropyCriterion",
66
+ "constraint_range": null,
67
+ "drop_worst_after": 0,
68
+ "drop_worst_ratio": 0.0,
69
+ "ignore_eos": false,
70
+ "ignore_prefix_size": 0,
71
+ "label_smoothing": 0.1,
72
+ "reg_alpha": 1.0,
73
+ "report_accuracy": false,
74
+ "sample_patch_num": 196,
75
+ "sentence_avg": false,
76
+ "use_rdrop": true
77
+ },
78
+ "hooks": [{
79
+ "type": "BestCkptSaverHook",
80
+ "metric_key": "accuracy",
81
+ "interval": 100
82
+ },
83
+ {
84
+ "type": "TextLoggerHook",
85
+ "interval": 1
86
+ },
87
+ {
88
+ "type": "IterTimerHook"
89
+ }
90
+ ]
91
+ },
92
+ "evaluation": {
93
+ "dataloader": {
94
+ "batch_size_per_gpu": 4,
95
+ "workers_per_gpu": 0
96
+ },
97
+ "metrics": [{
98
+ "type": "accuracy"
99
+ }]
100
+ },
101
+ "preprocessor": []
102
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a39fe1cdce5c98f554a04f7caf76f5be02dfdc1f440611d186bd878fc8b02c19
3
+ size 706781709
resources/.ipynb_checkpoints/OFA_logo_tp_path-checkpoint.svg ADDED

Git LFS Details

  • SHA256: 07251e0f5b690ef58c6cba2745c4c4ef4308d08c4fbf2c70f2d5997ced0d7a03
  • Pointer size: 129 Bytes
  • Size of remote file: 1.42 kB
resources/.ipynb_checkpoints/image_ocr_recognition-checkpoint.jpg ADDED

Git LFS Details

  • SHA256: 772b19f76c98044e39330853928624f10e085106a4292b4dd19f865531080747
  • Pointer size: 128 Bytes
  • Size of remote file: 959 Bytes
resources/.ipynb_checkpoints/ocr_general-checkpoint.png ADDED

Git LFS Details

  • SHA256: b74b330aa2b6c763ddf596e28c87e407b15e185bf6ac259173ee8ebe47eafdb9
  • Pointer size: 132 Bytes
  • Size of remote file: 1.68 MB
resources/.ipynb_checkpoints/ocr_general_demo-checkpoint.png ADDED

Git LFS Details

  • SHA256: 216fde2653c864508929d5ffc64181a74ca2d7eb866fe9f728a96126cf876a2e
  • Pointer size: 132 Bytes
  • Size of remote file: 1.32 MB
resources/.ipynb_checkpoints/ocr_scene-checkpoint.png ADDED

Git LFS Details

  • SHA256: 5136906c80f78bc33980019ec3e35581cbcf22a312d0703d3f57b3f589fdeacb
  • Pointer size: 132 Bytes
  • Size of remote file: 1.37 MB
resources/OFA_logo_tp_path.svg ADDED

Git LFS Details

  • SHA256: 07251e0f5b690ef58c6cba2745c4c4ef4308d08c4fbf2c70f2d5997ced0d7a03
  • Pointer size: 129 Bytes
  • Size of remote file: 1.42 kB
resources/image_ocr_recognition.jpg ADDED

Git LFS Details

  • SHA256: 772b19f76c98044e39330853928624f10e085106a4292b4dd19f865531080747
  • Pointer size: 128 Bytes
  • Size of remote file: 959 Bytes
resources/ocr_general.png ADDED

Git LFS Details

  • SHA256: b74b330aa2b6c763ddf596e28c87e407b15e185bf6ac259173ee8ebe47eafdb9
  • Pointer size: 132 Bytes
  • Size of remote file: 1.68 MB
resources/ocr_general_demo.png ADDED

Git LFS Details

  • SHA256: 49213aa15937426d2111ad2348f49b15cd60408b1554d22f0245c33c68273100
  • Pointer size: 132 Bytes
  • Size of remote file: 2.36 MB
resources/ocr_scene.png ADDED

Git LFS Details

  • SHA256: 5136906c80f78bc33980019ec3e35581cbcf22a312d0703d3f57b3f589fdeacb
  • Pointer size: 132 Bytes
  • Size of remote file: 1.37 MB
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "do_lower_case": false
3
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff