jukofyork
/

Mistral-Large-Instruct-2411-DRAFT-0.4B-v3.0-GGUF

speculative-decoding

Model card Files Files and versions

A 0.4B parameter draft (speculative decoding) model for use with Mistral-Large-Instruct-2411 and Mistral-Large-Instruct-2407.

See Mistral-Large-Instruct-2411-DRAFT-0.4B-v3.0 for the models in transformers format, and a detailed explanation of how the model was created.

I've included the Q4_0 quants for 3 different context lengths:

NOTES:

The 14 heads of Qwen2.5-0.5B doesn't allow for any of the other 4-bit quants to be made (and experimentation has shown using more or less than 4-bits for speculative decoding is a waste of time anwyay).
Due to llama.cpp using "static-YaRN" the scaling factor remains constant regardless of input length! Only use the longer context versions when processing long contexts is required...

Downloads last month: 68

GGUF

Model size

0.4B params

Architecture

qwen2

Hardware compatibility

Log In to view the estimation

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jukofyork/Mistral-Large-Instruct-2411-DRAFT-0.4B-v3.0-GGUF

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Quantized

(143)

this model

Datasets used to train jukofyork/Mistral-Large-Instruct-2411-DRAFT-0.4B-v3.0-GGUF

Collection including jukofyork/Mistral-Large-Instruct-2411-DRAFT-0.4B-v3.0-GGUF

Draft Models

Tiny "draft" models for speculative decoding. • 36 items • Updated 29 days ago • 6