๋ฐ์ดํฐ ์
LIMO
LIMO ํ๊ตญ์ด ๋ฒ์ญ
ํน์ด์ฌํญ
- ์๋ LIMO์์๋ 15 epoch ํ์ต์ ์ํํจ
- ์์ด1+ํ๊ตญ์ด2 ๋ฐ์ดํฐ ์
์ ์์ ํ 5 epoch ํ์ต์์ผ ์๋ ํ์ต ๋ฐฉ๋ฒ๊ณผ ์ ์ฌํ ํ์๋งํผ, ๊ทธ๋ฌ๋ ์ฝ๊ฐ์ ๋ณํ์ด ์๋๋ก ํ์ต์ํค๋ ค๊ณ ํจ
- ๊ทธ๋ฌ๋ ์ ์ฑ ํ๊ฐ์์ 4 epoch ์์ ์ checkpoint๊ฐ ๊ฐ์ฅ ์ฑ๋ฅ์ด ์ข์ ๋ณด์์
Training Details

Axolotl config
base_model: beomi/EXAONE-3.5-32B-Instruct-Llamafied
model_type: AutoModelForCausalLM
tokenizer_config: beomi/EXAONE-3.5-32B-Instruct-Llamafied
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: werty1248/kk_oo_llliiimmmooo
field_messages: conversations
type: chat_template
chat_template: tokenizer_default
dataset_prepared_path: ./data_preparation
output_dir: /workspace/data
hf_use_auth_token: true
sequence_len: 32768
sample_packing: false
pad_to_sequence_len: true
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true
wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:
gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 5
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 5.0e-6
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.05
eval_table_size:
save_total_limit: 2
deepspeed: ./deepspeed_configs/zero3_bf16.json
special_tokens:
pad_token: "[|endofturn|]"