YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Bringing SOTA quantization to mobile LLM deployment: A practical Executorch integration guide

Usage

Download and install the .apk file on your Android phone. (llama3-aqlm.apk for ~1.1 tok/s at low power consumption. llama3-aqlm-4cores.apk for ~2.7 tok/s at high loads)
Download the .pte and .model files and put them into the /data/local/tmp/llama folder on your Android phone.
Running the app you will see the option to load the .pte and .model files. After loading them, you'll be able to chat with the model.

This app was tested on Samsung S24 Ultra running Android 14.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support