IceLlama3.2-3B-01

This model is a fine-tuned version of meta-llama/Llama-3.2-3B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
0.2704	0.0707	10000	2.0553
0.2647	0.1414	20000	1.9754
0.2566	0.2121	30000	1.9398
0.2558	0.2828	40000	1.9169
0.251	0.3535	50000	1.9004
0.2479	0.4242	60000	1.8903
0.2507	0.4949	70000	1.8823
0.2538	0.5656	80000	1.8753
0.2468	0.6363	90000	1.8692
0.2504	0.7070	100000	1.8660
0.2466	0.7777	110000	1.8614
0.2489	0.8484	120000	1.8587
0.2489	0.9191	130000	1.8552
0.2455	0.9898	140000	1.8529
0.2457	1.0606	150000	1.8512
0.2417	1.1313	160000	1.8493
0.2441	1.2020	170000	1.8479
0.2437	1.2727	180000	1.8472
0.2408	1.3434	190000	1.8460
0.2413	1.4141	200000	1.8450
0.2409	1.4848	210000	1.8443
0.242	1.5555	220000	1.8439
0.2435	1.6262	230000	1.8437
0.2446	1.6969	240000	1.8428
0.2469	1.7676	250000	1.8427
0.2396	1.8383	260000	1.8424
0.2433	1.9090	270000	1.8422
0.2475	1.9797	280000	1.8422
0.2457	2.0504	290000	1.8420
0.2412	2.1211	300000	1.8419
0.2461	2.1918	310000	1.8418
0.2444	2.2625	320000	1.8417
0.2436	2.3332	330000	1.8418
0.2456	2.4039	340000	1.8418
0.2411	2.4746	350000	1.8417
0.2459	2.5453	360000	1.8417
0.2425	2.6160	370000	1.8417
0.2464	2.6867	380000	1.8418
0.2431	2.7574	390000	1.8417
0.25	2.8281	400000	1.8418
0.2439	2.8988	410000	1.8418
0.2425	2.9695	420000	1.8418

Safetensors

Model size

3B params

Tensor type

BF16

Base model

Finetuned

(357)

this model

Quantizations