YAML Metadata
		Warning:
	empty or missing yaml metadata in repo card
	(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Bringing SOTA quantization to mobile LLM deployment: A practical Executorch integration guide
Article: https://blacksamorez.substack.com/p/aqlm-executorch-android
Usage
- Download and install the .apkfile on your Android phone. (llama3-aqlm.apkfor ~1.1 tok/s at low power consumption.llama3-aqlm-4cores.apkfor ~2.7 tok/s at high loads)
- Download the .pteand.modelfiles and put them into the/data/local/tmp/llamafolder on your Android phone.
- Running the app you will see the option to load the .pteand.modelfiles. After loading them, you'll be able to chat with the model.
Requirements
This app was tested on Samsung S24 Ultra running Android 14.
Limitations
- Although the app looks like chat, generation requests are independent.
- Llama-3 chat template is hard-coded into the app.
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support
