Scicom-intl
/

Meta-Llama-3.1-70B-Instruct-Malaysian

Model card Files Files and versions

Qwen2.5-72B-Instruct-Malaysian

SFT LoRA meta-llama/Llama-3.1-70B-Instruct on Scicom-intl/Malaysian-Instructions/commit/288b358a57765a735d588f73e5e6c212c81429bd

Dense LoRA SFT done using DeepSpeed Zero3 HF Trainer.
Multipacking variable length 16384 context length, with global batch size of 32, so global total tokens is 524288.
All linear layers with rank 256 with alpha multiply by 2.0
Liger fused cross entropy.
1e-4 learning rate, 50 warmup, 3 epoch only.

Source code

Source code at https://github.com/Scicom-AI-Enterprise-Organization/small-ablation/blob/main/malaysian-sft

Acknowledgement

Special thanks to https://www.scitix.ai/ for H100 Node!

Downloads last month: 8

Safetensors

Model size

71B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Scicom-intl/Meta-Llama-3.1-70B-Instruct-Malaysian

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.1-70B-Instruct

Finetuned

(82)

this model

Quantizations

Dataset used to train Scicom-intl/Meta-Llama-3.1-70B-Instruct-Malaysian

Collection including Scicom-intl/Meta-Llama-3.1-70B-Instruct-Malaysian

Malaysian SFT

SFT using LoRA and DoRA including reasoning. • 11 items • Updated 6 days ago