๋ฐ์ดํ„ฐ ์…‹

LIMO

  • GAIR/LIMO (์˜์–ด, ์›๋ณธ)

LIMO ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ

ํŠน์ด์‚ฌํ•ญ

  • ์›๋ž˜ LIMO์—์„œ๋Š” 15 epoch ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•จ
  • ์˜์–ด1+ํ•œ๊ตญ์–ด2 ๋ฐ์ดํ„ฐ ์…‹์„ ์„ž์€ ํ›„ 5 epoch ํ•™์Šต์‹œ์ผœ ์›๋ž˜ ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•œ ํšŸ์ˆ˜๋งŒํผ, ๊ทธ๋Ÿฌ๋‚˜ ์•ฝ๊ฐ„์˜ ๋ณ€ํ˜•์ด ์žˆ๋„๋ก ํ•™์Šต์‹œํ‚ค๋ ค๊ณ  ํ•จ
  • ๊ทธ๋Ÿฌ๋‚˜ ์ •์„ฑ ํ‰๊ฐ€์—์„œ 4 epoch ์‹œ์ ์˜ checkpoint๊ฐ€ ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์•„ ๋ณด์˜€์Œ

Training Details

  • 4xH200 SXM, 13.5 Hours

image/png

Axolotl config
base_model: beomi/EXAONE-3.5-32B-Instruct-Llamafied
model_type: AutoModelForCausalLM
tokenizer_config: beomi/EXAONE-3.5-32B-Instruct-Llamafied
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: werty1248/kk_oo_llliiimmmooo
    field_messages: conversations
    type: chat_template
    chat_template: tokenizer_default

dataset_prepared_path: ./data_preparation
output_dir: /workspace/data

hf_use_auth_token: true

sequence_len: 32768
sample_packing: false
pad_to_sequence_len: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 5
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 5.0e-6

train_on_inputs: false
group_by_length: false
bf16: auto
fp16: 
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.05
eval_table_size:

save_total_limit: 2

deepspeed: ./deepspeed_configs/zero3_bf16.json

special_tokens:
  pad_token: "[|endofturn|]"
Downloads last month
3
Safetensors
Model size
32B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for werty1248/EXAONE-3.5-32B-LIMO-Ko-e4

Finetuned
(6)
this model
Quantizations
1 model

Datasets used to train werty1248/EXAONE-3.5-32B-LIMO-Ko-e4