GDN-distill
This is a model uploaded from /mnt/nanjingcephfs/project_wx-rec-alg-bdc-exp/bwzheng/yulan/hyw/pretrain-linear-moe-dev/RADLADS-paper/out/L56-D1920-qwen_gdn_qwen2-e1-nh8-hd64-nvh32-A0-S4096--step1-mix2b-token9.2B/rwkv-final-hf-A7-0_8_16_24_32_40_48/mcore-tp2pp1.