Improve language tag

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show

README.md +104 -90

README.md CHANGED Viewed

@@ -1,91 +1,105 @@
----
-base_model: Qwen/Qwen2.5-7B-Instruct
-library_name: peft
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-base_model: Qwen/Qwen2.5-7B-Instruct
-library_name: peft
-If you are unable to directly use [MTIPA-7B-LoRA(This model)](https://huggingface.co/LLMMINE/MTIPA-7B-PositionTask/tree/main)\(Recommend\), **[This is an MTIPA-7B merged LoRA version.](https://huggingface.co/LLMMINE/MTIPA-7B-POSITION-MERGE)** Please load the model directly.
-It should be noted that the MTIPA, TIPA, and training data for this model are all from Chinese, and support for other languages may not be sufficient. If you need to train a model specific to a particular language or for a general purpose, please refer to our paper and GitHub
-This model is trained on the MTIPA dataset, and its function is to predict the position of Chinese misspelled characters and output the original misspelled characters and corrected characters
-```python
-from peft import PeftModel
-from transformers import AutoModelForCausalLM, AutoTokenizer
-base_model = AutoModelForCausalLM.from_pretrained(
-    "Qwen/Qwen2.5-7B-Instruct",
-    trust_remote_code=True,
-    torch_dtype="auto",
-    device_map="auto")
-tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
-model = PeftModel.from_pretrained(base_model, "LLMMINE/MTIPA-7B-PositionTask")
-def chat(text):
-    system = "纠正输入这段话中的错别字，以[{position: 字符位置, incorrect: 错误字符, correct: 纠正后的字符}, ...]形式给出，字符位置从1开始计数，如果全部正确，给出[]\n"
-    messages = [
-        {"role": "system", "content": system},
-        {"role": "user", "content": text}
-    ]
-    text_input = tokenizer.apply_chat_template(
-        messages,
-        tokenize=False,
-        add_generation_prompt=True
-    )
-    # print("Input to model:")
-    # print(text_input)
-    model_inputs = tokenizer([text_input], return_tensors="pt").to(model.device)
-    generated_ids = model.generate(
-        **model_inputs,
-        max_new_tokens=512,
-        temperature=0.01,
-    )
-    generated_ids = [
-        output_ids[len(input_ids):]
-        for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
-    ]
-    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-    # print("Model response:")
-    # print(response)
-    return response
-def main():
-    print("命令行聊天程序已启动。输入您的文本，或输入 'exit' 退出。")
-    while True:
-        user_input = input("您: ")
-        if user_input.lower() in ['exit', 'quit']:
-            print("程序已退出。")
-            break
-        if not user_input.strip():
-            print("请输入文本。")
-            continue
-        response = chat(user_input)
-        print("回复:", response)
-if __name__ == '__main__':
-    main()
-```
-Input:
-```
-花雨在镇上落了一整夜，这静寂的风暴覆盖了屋顶，堵住了房门，令露宿的动物窒息而死。如此多的花朵自天而降，天亮时大界小巷都覆上了一层绵密的花毯，人们得用铲子耙子清理出通道才能出殡。
-```
-Taken from One Hundred Years of Solitude (Cien Años de Soledad) And let `街` -> `界`
-Output:
-```
-[{"position": 56, "incorrect": "界", "correct": "街"}]
-```
-[**Github**](https://github.com/FloatFrank/TIPA) | [**Paper**](https://arxiv.org/abs/2411.17679)
 - PEFT 0.12.0

+---
+base_model: Qwen/Qwen2.5-7B-Instruct
+library_name: peft
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+base_model: Qwen/Qwen2.5-7B-Instruct
+library_name: peft
+If you are unable to directly use [MTIPA-7B-LoRA(This model)](https://huggingface.co/LLMMINE/MTIPA-7B-PositionTask/tree/main)\(Recommend\), **[This is an MTIPA-7B merged LoRA version.](https://huggingface.co/LLMMINE/MTIPA-7B-POSITION-MERGE)** Please load the model directly.
+It should be noted that the MTIPA, TIPA, and training data for this model are all from Chinese, and support for other languages may not be sufficient. If you need to train a model specific to a particular language or for a general purpose, please refer to our paper and GitHub
+This model is trained on the MTIPA dataset, and its function is to predict the position of Chinese misspelled characters and output the original misspelled characters and corrected characters
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-7B-Instruct",
+    trust_remote_code=True,
+    torch_dtype="auto",
+    device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
+model = PeftModel.from_pretrained(base_model, "LLMMINE/MTIPA-7B-PositionTask")
+def chat(text):
+    system = "纠正输入这段��中的错别字，以[{position: 字符位置, incorrect: 错误字符, correct: 纠正后的字符}, ...]形式给出，字符位置从1开始计数，如果全部正确，给出[]\n"
+    messages = [
+        {"role": "system", "content": system},
+        {"role": "user", "content": text}
+    ]
+    text_input = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+    # print("Input to model:")
+    # print(text_input)
+    model_inputs = tokenizer([text_input], return_tensors="pt").to(model.device)
+    generated_ids = model.generate(
+        **model_inputs,
+        max_new_tokens=512,
+        temperature=0.01,
+    )
+    generated_ids = [
+        output_ids[len(input_ids):]
+        for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+    ]
+    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+    # print("Model response:")
+    # print(response)
+    return response
+def main():
+    print("命令行聊天程序已启动。输入您的文本，或输入 'exit' 退出。")
+    while True:
+        user_input = input("您: ")
+        if user_input.lower() in ['exit', 'quit']:
+            print("程序已退出。")
+            break
+        if not user_input.strip():
+            print("请输入文本。")
+            continue
+        response = chat(user_input)
+        print("回复:", response)
+if __name__ == '__main__':
+    main()
+```
+Input:
+```
+花雨在镇上落了一整夜，这静寂的风暴覆盖了屋顶，堵住了房门，令露宿的动物窒息而死。如此多的花朵自天而降，天亮时大界小巷都覆上了一层绵密的花毯，人们得用铲子耙子清理出通道才能出殡。
+```
+Taken from One Hundred Years of Solitude (Cien Años de Soledad) And let `街` -> `界`
+Output:
+```
+[{"position": 56, "incorrect": "界", "correct": "街"}]
+```
+[**Github**](https://github.com/FloatFrank/TIPA) | [**Paper**](https://arxiv.org/abs/2411.17679)
 - PEFT 0.12.0