Spaces:

CultriX
/

Tiny-LeaderBoard

Sleeping

App Files Files Community

CultriX commited on Dec 30, 2024

Commit

207ba8c

verified ·

1 Parent(s): 1c2e4dc

Update scrape-leaderboard.py

Browse files

Files changed (1) hide show

scrape-leaderboard.py +117 -6

scrape-leaderboard.py CHANGED Viewed

@@ -1,12 +1,123 @@
 import requests
 from bs4 import BeautifulSoup
-# 1. A list of model benchmark data from your “DATA START”. Each entry contains:
-#    - rank
-#    - name
-#    - scores (average, IFEval, BBH, MATH, GPQA, MUSR, MMLU-PRO)
-#    - hf_url: the Hugging Face URL to scrape for a MergeKit config
-#    - known_config: if we already know the configuration, store it here; otherwise None.
 benchmark_data = [
     {
         "rank": 44,

 import requests
 from bs4 import BeautifulSoup
+#!/usr/bin/env python3
+def main():
+    print("""### INSTRUCTION ###
+Read the instructions below:
+1. You will be presented with benchmark scores by various LLM's.
+2. The layout of the data presented to you is as follows:
+>>> START LAYOUT EXAMPLE <<<
+--- (start of a new model marker)
+Model ranking
+Model name
+Model average score across benchmarks in %
+Models average score on IFEval benchmarks in %
+Models average score on BBH benchmarks in %
+Models average score on MATH benchmarks in %
+Models average score in GPQA benchmarks in %
+Models average score in MUSR benchmarks in %
+Models average score in MMLU-PRO benchmarks in %
+### (start of YAML-configuration marker)
+The YAML-configuration file that was used to create the model in mergekit.
+Note that this part is only available for certain models and not for all models on the list!
+The configuration starts and ends with '###'
+### (end of YAML-configuration marker)
+>>> END LAYOUT EXAMPLE <<<
+For example, the following input could be possible:
+>>> START INPUT EXAMPLE <<<
+---
+44
+sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
+40.10 %
+72.57 %
+48.58 %
+34.44 %
+17.34 %
+19.39 %
+48.26 %
+###
+models:
+  - model: CultriX/SeQwence-14Bv1
+  - model: allknowingroger/Qwenslerp5-14B
+merge_method: slerp
+base_model: CultriX/SeQwence-14Bv1
+dtype: bfloat16
+parameters:
+  t: [0, 0.5, 1, 0.5, 0] # V shaped curve: Hermes for input & output, WizardMath in the middle layers
+###
+---
+45
+sthenno-com/miscii-14b-1225
+40.08 %
+78.78 %
+50.91 %
+31.57 %
+17.00 %
+14.77 %
+47.46 %
+---
+>>> END INPUT EXAMPLE <<<
+>>> START INTERPRETATION OF INPUT EXAMPLE <<<
+---
+Model Rank: 44
+Model Name: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
+Model average score across benchmarks in %: 40.10
+Models average score on IFEval benchmarks in %: 72.57
+Models average score on BBH benchmarks in %: 48.58
+Models average score on MATH benchmarks in % 34.44
+Models average score in GPQA benchmarks in % 17.34
+Models average score in MUSR benchmarks in % 19.39
+Models average score in MMLU-PRO benchmarks in % 48.26
+### (THE CONFIGURATION FOR MERGING THE MODEL: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 was found)
+models:
+  - model: CultriX/SeQwence-14Bv1
+  - model: allknowingroger/Qwenslerp5-14B
+merge_method: slerp
+base_model: CultriX/SeQwence-14Bv1
+dtype: bfloat16
+parameters:
+  t: [0, 0.5, 1, 0.5, 0] # V shaped curve: Hermes for input & output, WizardMath in the middle layers
+###
+---
+Model Rank: 45
+Model Name: sthenno-com/miscii-14b-1225
+Model average score across benchmarks in %: 40.08
+Models average score on IFEval benchmarks in %: 78.78
+Models average score on BBH benchmarks in %: 50.91
+Models average score on MATH benchmarks in % 31.57
+Models average score in GPQA benchmarks in % 17.00
+Models average score in MUSR benchmarks in % 14.77
+Models average score in MMLU-PRO benchmarks in % 47.46
+###
+THE MERGEKIT CONFIGURATION FOR MODEL sthenno-com/miscii-14b-1225 WAS NOT FOUND SO IT IS SKIPPED
+###
+--- (next model etc...)
+>>> END INTERPRETATION OF INPUT EXAMPLE <<<
+4. >>> INSTRUCTIONS <<<
+Below follows the scraped data from the leaderboard.
+>>> DATA START <<<
+"""
 benchmark_data = [
     {
         "rank": 44,