Group Relative Policy Optimization fine-tunes for DialLM across Gemma, Llama, and Qwen models, covering all dialect variants.
Jordan Painter
jordanpainter
AI & ML interests
None yet
Recent Activity
updated a model 24 days ago
jordanpainter/diallm-llama-base-sft-ind published a model 24 days ago
jordanpainter/diallm-llama-base-sft-ind updated a model 24 days ago
jordanpainter/diallm-llama-base-sft-britOrganizations
DialLM DPO 🎯
DialLM DPO checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Post-SFT preference alignment.
DialLM CPT 🌍
Continual pre-training checkpoints using ICE for DialLM across Gemma, Llama, and Qwen base models.
DialLM GRPO 🐦
Group Relative Policy Optimization fine-tunes for DialLM across Gemma, Llama, and Qwen models, covering all dialect variants.
DialLM GSPO 🐙
DialLM GSPO checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Post-SFT RL
DialLM DPO 🎯
DialLM DPO checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Post-SFT preference alignment.
DialLM SFT 🦈
DialLM SFT checkpoints across Gemma, Llama & Qwen for Australian, Northern British, Indian, and all-dialect conditions. Pre-RL alignment.
DialLM CPT 🌍
Continual pre-training checkpoints using ICE for DialLM across Gemma, Llama, and Qwen base models.