lbourdois commited on
Commit
6fceb10
·
verified ·
1 Parent(s): bbbda63

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +90 -78
README.md CHANGED
@@ -1,79 +1,91 @@
1
- ---
2
- license: other
3
- language:
4
- - en
5
- library_name: transformers
6
- tags:
7
- - RLHF
8
- - Nexusflow
9
- - Athene
10
- - Chat Model
11
- base_model:
12
- - Qwen/Qwen2.5-72B-Instruct
13
- ---
14
- # Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks
15
-
16
- <p align="center">
17
- <a href="https://huggingface.co/Nexusflow" target="_blank">Nexusflow HF</a> - <a href="https://discord.gg/HDSVmNAs3y" target="_blank">Nexusflow Discord</a> - <a href="https://nexusflow.ai/blogs/athene-v2" target="_blank">Athene-V2 Blogpost</a>
18
- </p>
19
-
20
-
21
- We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is trained through RLHF with Qwen-2.5-72B-Instruct as base model.
22
- Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Agent), surpasses GPT-4o in complex function calling and agentic applications.
23
-
24
-
25
- <p align="center" width="100%">
26
- <a><img src="benchmark.png" alt="Benchmark" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
27
- </p>
28
-
29
- - **Developed by:** The Nexusflow Team
30
- - **Model type:** Chat Model
31
- - **Finetuned from model:** [Qwen 2.5 72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
32
- - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
33
- - **Blog**: https://nexusflow.ai/blogs/athene-v2
34
-
35
- ## Usage
36
- Athene-V2-Chat uses the same chat template as Qwen2.5-72B-Instruct. Below is an example simple usage using the Transformers library.
37
-
38
- ```Python
39
- from transformers import AutoModelForCausalLM, AutoTokenizer
40
-
41
- model_name = "Nexusflow/Athene-V2-Chat"
42
-
43
- model = AutoModelForCausalLM.from_pretrained(
44
- model_name,
45
- torch_dtype="auto",
46
- device_map="auto"
47
- )
48
- tokenizer = AutoTokenizer.from_pretrained(model_name)
49
-
50
- prompt = "Write a Python function to return the nth Fibonacci number in log n runtime."
51
-
52
- messages = [
53
- {"role": "user", "content": prompt}
54
- ]
55
-
56
- text = tokenizer.apply_chat_template(
57
- messages,
58
- tokenize=False,
59
- add_generation_prompt=True
60
- )
61
-
62
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
63
-
64
- generated_ids = model.generate(
65
- **model_inputs,
66
- max_new_tokens=2048
67
- )
68
-
69
- generated_ids = [
70
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
71
- ]
72
-
73
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
74
- ```
75
-
76
- Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
77
-
78
- ## Acknowledgment
 
 
 
 
 
 
 
 
 
 
 
 
79
  We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ library_name: transformers
18
+ tags:
19
+ - RLHF
20
+ - Nexusflow
21
+ - Athene
22
+ - Chat Model
23
+ base_model:
24
+ - Qwen/Qwen2.5-72B-Instruct
25
+ ---
26
+ # Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks
27
+
28
+ <p align="center">
29
+ <a href="https://huggingface.co/Nexusflow" target="_blank">Nexusflow HF</a> - <a href="https://discord.gg/HDSVmNAs3y" target="_blank">Nexusflow Discord</a> - <a href="https://nexusflow.ai/blogs/athene-v2" target="_blank">Athene-V2 Blogpost</a>
30
+ </p>
31
+
32
+
33
+ We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is trained through RLHF with Qwen-2.5-72B-Instruct as base model.
34
+ Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Agent), surpasses GPT-4o in complex function calling and agentic applications.
35
+
36
+
37
+ <p align="center" width="100%">
38
+ <a><img src="benchmark.png" alt="Benchmark" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
39
+ </p>
40
+
41
+ - **Developed by:** The Nexusflow Team
42
+ - **Model type:** Chat Model
43
+ - **Finetuned from model:** [Qwen 2.5 72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
44
+ - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
45
+ - **Blog**: https://nexusflow.ai/blogs/athene-v2
46
+
47
+ ## Usage
48
+ Athene-V2-Chat uses the same chat template as Qwen2.5-72B-Instruct. Below is an example simple usage using the Transformers library.
49
+
50
+ ```Python
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
52
+
53
+ model_name = "Nexusflow/Athene-V2-Chat"
54
+
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ model_name,
57
+ torch_dtype="auto",
58
+ device_map="auto"
59
+ )
60
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
61
+
62
+ prompt = "Write a Python function to return the nth Fibonacci number in log n runtime."
63
+
64
+ messages = [
65
+ {"role": "user", "content": prompt}
66
+ ]
67
+
68
+ text = tokenizer.apply_chat_template(
69
+ messages,
70
+ tokenize=False,
71
+ add_generation_prompt=True
72
+ )
73
+
74
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
75
+
76
+ generated_ids = model.generate(
77
+ **model_inputs,
78
+ max_new_tokens=2048
79
+ )
80
+
81
+ generated_ids = [
82
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
83
+ ]
84
+
85
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
86
+ ```
87
+
88
+ Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
89
+
90
+ ## Acknowledgment
91
  We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.