Spaces:

CultriX
/

Tiny-LeaderBoard

Sleeping

App Files Files Community

CultriX commited on Dec 30, 2024

Commit

006d441

verified ·

1 Parent(s): 207ba8c

Update scrape-leaderboard.py

Browse files

Files changed (1) hide show

scrape-leaderboard.py +6 -117

scrape-leaderboard.py CHANGED Viewed

@@ -1,123 +1,12 @@
 import requests
 from bs4 import BeautifulSoup
-#!/usr/bin/env python3
-def main():
-    print("""### INSTRUCTION ###
-Read the instructions below:
-1. You will be presented with benchmark scores by various LLM's.
-2. The layout of the data presented to you is as follows:
->>> START LAYOUT EXAMPLE <<<
---- (start of a new model marker)
-Model ranking
-Model name
-Model average score across benchmarks in %
-Models average score on IFEval benchmarks in %
-Models average score on BBH benchmarks in %
-Models average score on MATH benchmarks in %
-Models average score in GPQA benchmarks in %
-Models average score in MUSR benchmarks in %
-Models average score in MMLU-PRO benchmarks in %
-### (start of YAML-configuration marker)
-The YAML-configuration file that was used to create the model in mergekit.
-Note that this part is only available for certain models and not for all models on the list!
-The configuration starts and ends with '###'
-### (end of YAML-configuration marker)
->>> END LAYOUT EXAMPLE <<<
-For example, the following input could be possible:
->>> START INPUT EXAMPLE <<<
----
-44
-sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
-40.10 %
-72.57 %
-48.58 %
-34.44 %
-17.34 %
-19.39 %
-48.26 %
-###
-models:
-  - model: CultriX/SeQwence-14Bv1
-  - model: allknowingroger/Qwenslerp5-14B
-merge_method: slerp
-base_model: CultriX/SeQwence-14Bv1
-dtype: bfloat16
-parameters:
-  t: [0, 0.5, 1, 0.5, 0] # V shaped curve: Hermes for input & output, WizardMath in the middle layers
-###
----
-45
-sthenno-com/miscii-14b-1225
-40.08 %
-78.78 %
-50.91 %
-31.57 %
-17.00 %
-14.77 %
-47.46 %
----
->>> END INPUT EXAMPLE <<<
->>> START INTERPRETATION OF INPUT EXAMPLE <<<
----
-Model Rank: 44
-Model Name: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
-Model average score across benchmarks in %: 40.10
-Models average score on IFEval benchmarks in %: 72.57
-Models average score on BBH benchmarks in %: 48.58
-Models average score on MATH benchmarks in % 34.44
-Models average score in GPQA benchmarks in % 17.34
-Models average score in MUSR benchmarks in % 19.39
-Models average score in MMLU-PRO benchmarks in % 48.26
-### (THE CONFIGURATION FOR MERGING THE MODEL: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 was found)
-models:
-  - model: CultriX/SeQwence-14Bv1
-  - model: allknowingroger/Qwenslerp5-14B
-merge_method: slerp
-base_model: CultriX/SeQwence-14Bv1
-dtype: bfloat16
-parameters:
-  t: [0, 0.5, 1, 0.5, 0] # V shaped curve: Hermes for input & output, WizardMath in the middle layers
-###
----
-Model Rank: 45
-Model Name: sthenno-com/miscii-14b-1225
-Model average score across benchmarks in %: 40.08
-Models average score on IFEval benchmarks in %: 78.78
-Models average score on BBH benchmarks in %: 50.91
-Models average score on MATH benchmarks in % 31.57
-Models average score in GPQA benchmarks in % 17.00
-Models average score in MUSR benchmarks in % 14.77
-Models average score in MMLU-PRO benchmarks in % 47.46
-###
-THE MERGEKIT CONFIGURATION FOR MODEL sthenno-com/miscii-14b-1225 WAS NOT FOUND SO IT IS SKIPPED
-###
---- (next model etc...)
->>> END INTERPRETATION OF INPUT EXAMPLE <<<
-4. >>> INSTRUCTIONS <<<
-Below follows the scraped data from the leaderboard.
->>> DATA START <<<
-"""
 benchmark_data = [
     {
         "rank": 44,

 import requests
 from bs4 import BeautifulSoup
+# 1. A list of model benchmark data from your “DATA START”. Each entry contains:
+#    - rank
+#    - name
+#    - scores (average, IFEval, BBH, MATH, GPQA, MUSR, MMLU-PRO)
+#    - hf_url: the Hugging Face URL to scrape for a MergeKit config
+#    - known_config: if we already know the configuration, store it here; otherwise None.
 benchmark_data = [
     {
         "rank": 44,