Alpie Core: 4-bit Quantized Reasoning Model

1. Introduction

Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among one of the first worldwide at this scale. Trained on just 8 Hopper GPUs using LoRA for parameter-efficient fine-tuning, combined with QLoRA 4-bit quantization, and synthetic STEM-rich dataset distillation, it proves that aggressive quantization can not only match but also surpass full-precision baselines.

With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating some top proprietary models. It achieves 81.28% on MMLU, 92.75% on GSM8K, and 57.8% on SWE-Bench Verified, ranking top globally on competitive leaderboards, a demonstration that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.

2. Model Summary

Base Architecture: DeepSeek-R1-Distill-Qwen-32B
Parameters: 32 billion (quantized to 4-bit)
Training Method: Supervised Fine-Tuning (SFT) using LoRA/QLoRA techniques
Quantization: 4-bit NF4 with double quantization
Context Length: 65k tokens
Max Output Length: 16,384 tokens
Training Data Sources: Synthetic (STEM, reasoning, coding) + domain-rich curated data (law, Indian context, exams, multilingual).
License: Apache 2.0

3. Approach

Alpie Core has undergone extensive supervised fine-tuning (SFT) to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimised with high-quality LLM-generated responses. The fine-tuning process emphasised adherence to rigorous safety and usability standards, including:

1.User Understanding and Clarity – ensuring outputs are direct, interpretable, and pedagogically sound.

2.Security and Ethical Guidelines – filtering unsafe or harmful generations during and after training.

3.Limitations, Disclaimers, and Knowledge Boundaries – transparently communicating uncertainty and scope.

4.Handling Complex and Sensitive Topics – balancing informativeness with responsible guardrails.

5.Safety and Respectful Engagement – maintaining politeness, inclusivity, and cultural sensitivity.

6.Confidentiality and Responsible Use – preventing leakage of private training data, proprietary prompts, or internal reasoning traces.

This SFT approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases. This approach allows Alpie Core to generalize across global and Indian contexts while staying aligned to safe and responsible use guidelines.

4. Model Features

Supports Streaming – Real-time token-level responses
OpenAI-Compatible API – Seamless integration with OpenAI client libraries
65K Context Length – Handles very large inputs and conversations
16,384 Max Output Length – Enables extremely long generations
4-Bit Quantization – Memory-efficient and optimised for deployment
High Throughput Inference – Powered by vLLM for efficient large-scale serving
Low Latency Inference – Fast response times optimized for production
Customizable Safety & Moderation Filters – Built-in guardrails for safer outputs
Supports Function Calling / Tool Use – Enables structured outputs and external API integration
Instruction Following – Optimised for reasoning and chain-of-thought stepwise answers.
Education & Research Ready – Tailored for competitive exams, STEM reasoning, and knowledge-intensive tasks.

5. Key Highlights

First 4-bit Reasoning Model from India: Competitive globally with frontier models
Benchmark Competitiveness: Outperforms or matches 70B+ models across reasoning, math, and coding
STEM & Coding Strength: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
Efficiency & Deployment: 16 GB VRAM footprint, runs on commodity GPUs with vLLM
Extended Context Length: 65K tokens for research papers, conversations, multi-document reasoning
Environmental Benefits: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
Open-Source Commitment: Released under Apache 2.0 for global use

6. Benchmark Results

Benchmark	Alpie Core (32B-4bit)	DeepSeek-V2 (236B)	Qwen2.5 72B	Llama 3.1 405B	Llama 3.1 70B	Gemma-3 27B-PT	Mistral-Small-24B-Base-2501
MMLU (5-shot)	81.28%	78.4%	85.0%	84.4%	79.3%	78.6%	80.73%
GSM8K (8-shot)	92.75%	81.6%	88.3%	83.5%	-	82.2%	80.73%
BBH (3-shot)	85.12%	78.8%	79.8%	82.9%	81.6%	77.7%	-
MMLU-Pro (5-shot)	64.78%	51.4%	58.3%	52.8%	53.8%	52.2%	54.37%
MBPP (pass@1)	75.20%	65.0%	72.6%	68.4%	-	65.6%	69.64%
HumanEval (pass@1)	57.23%	43.3%	53.0%	54.9%	-	48.8%	=

These results demonstrate Alpie Core’s ability to rival or surpass leading proprietary and open-source models, despite being 4-bit quantized.

SWE-Bench Verified Performance

Rank	Model	Accuracy (%)	Performance vs Alpie
1	Alpie Core	57.8	Alpie
2	Qwen3-Coder-30B-A3B-Instruct	51.6	Below Alpie
3	o1	48.9	Below Alpie
4	o3-mini (high)	49.3	Below Alpie
5	Claude 3.5 Sonnet	49.0	Below Alpie
6	DeepSeek R1	49.2	Below Alpie
7	Devstral	46.8	Below Alpie

Humanity's Last Exam Leaderboard Performance

Rank	Model	Accuracy (%)	Performance vs Alpie
1	GPT 4.5 Preview	5.8	Above Alpie
2	Claude Sonnet 4	5.42	Above Alpie
3	Alpie Core 32B (4-bit)	5.41	Alpie
4	Llama 4 Maverik	5.34	Below Alpie
5	GPT 4.1	4.97	Below Alpie
6	Kimi K2 Instruct	4.68	Below Alpie
7	DeepSeek V3	4.55	Below Alpie
8	Gemini 1.5 Pro 002	4.55	Below Alpie

Additional Benchmarks

Benchmark	Alpie Core (32B-4bit)	Category
AIME	47.34%	Advanced Mathematics
GPQA (Diamond)	40.91%	Graduate-level QA
TruthfulQA (MC2)	60.05%	Truthfulness
HellaSwag	84.66%	Commonsense
PIQA	83.24%	Physical Reasoning
ARC Challenge	67.58%	Science QA
CommonSenseQA	87.06%	Commonsense
AGIEval	64.98%	General Intelligence
Winogrande	79.53%	Commonsense Reasoning
MATH-500	70.00%	Advanced Mathematics

7. Training Details

Hardware: 8× NVIDIA HOPPER-80GB GPUs
Fine-tuning Method: LoRA/QLoRA with the following configuration:
- LoRA Alpha: 16
- LoRA Dropout: 0.05
- LoRA Rank: 16
Quantization: 4-bit NF4 + Double Quantization + FP16 compute
Dataset Domains: Mathematics, coding, reasoning, science, general knowledge, competitive exams, Indian context + law, multilingual (Hindi and Hinglish)
Synthetic Data Advantage: +15-20% performance boost in STEM & coding domains
Training Strategy: Multi-stage distillation → SFT → safety alignment.
Synthetic Data Advantage: Clarify source: LLM-generated, curated with multi-turn reasoning traces for STEM/coding.

8. Environmental Impact

Carbon Footprint: We estimated the environmental impact of training Alpie Core (32B) on 8× NVIDIA H100-80GB GPUs by calculating carbon emissions from GPU energy consumption. The calculation follows the formula: CO₂e (kg) = Grid CO₂ Factor (kg/kWh) × Runtime (hours) × Power per GPU (kW) × Number of GPUs

Training Parameters: Grid CO₂ Factor (Azure average): 0.364 kg CO₂e per kWh Runtime: 408 hours GPUs: 8× H100-80GB We report results under two assumption modes:

Realistic mode (average training draw ≈ 250 W per GPU = 0.25 kWh/hr): 0.364 × 408 × 0.25 × 8 ≈ 298 kg CO₂e

Conservative mode (near TDP ≈ 700 W per GPU = 0.70 kWh/hr): 0.364 × 408 × 0.70 × 8 ≈ 835 kg CO₂e

Total training footprint ranges from ~298 kg CO₂e (realistic) to ~835 kg CO₂e (conservative worst-case)

This makes Alpie Core one of the most carbon-efficient reasoning models released to date.

9. Use Cases

Best for STEM, complex mathematical reasoning, coding, and Indian context

1.STEM: Excels at solving advanced problems in science, technology, engineering, and mathematics with high accuracy.

2.Complex Mathematical Reasoning: Handles multi-step logical and quantitative reasoning tasks with strong reliability.

3.Coding: Supports software development, debugging, algorithmic problem-solving, and structured reasoning in code..

4.Indian Context: Provides culturally aware insights, competitive exam assistance (JEE, NEET, UPSC), and multilingual support in Hindi/Hinglish.

5.Research Assistants: Handle long contexts (65K) for academic and legal research.

10. Safety and Limitations

Enhanced Content Access

Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility and factual accuracy on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive geopolitical issues.

Current Limitations

Multilingual reasoning in Hindi/Hinglish shows room for improvement
Fixed knowledge cutoff without real-time information retrieval
Occasional struggles with complex multi-hop mathematical reasoning
Potential hallucinations in factual question-answering
Hallucinations: As with all LLMs, outputs should not be used for medical/legal advice without expert oversight.
Biases: Training on synthetic + curated datasets reduces bias, but some risks may persist.

Mitigations

Safety classifiers and output filtering systems
Model-assisted safety pipeline using RLHF
Comprehensive adversarial testing by domain experts

11. How to Use

Non-Streaming Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-Core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Ensure evaluation mode
model.eval()

# Sample inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1000)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Response:\n", response)

Streaming Inference

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-Core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Ensure evaluation mode
model.eval()

# Initialize streamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Sample streaming inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

print("Streaming Response:")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        streamer=streamer,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

Deployment Options

Transformers: Python, PyTorch integration
vLLM: High-throughput inference
Ollama: Easy local deployment and inference
- Size: 20GB
- Requirements: Minimum 20GB RAM/VRAM for local execution
- Local Deployment: Runs efficiently on local machines with sufficient resources

  # Pull the model
  ollama pull 169pi/alpie-core
  
  # Run the model
  ollama run 169pi/alpie-core

12. Citation

@misc{169pi2025alpiecore,
  title     = {Alpie-Core: A 4-Bit Quantized Reasoning Model from India that Outperforms Full-Precision Models},
  author    = {169Pi AI},
  year      = {2025},
  url       = {https://huggingface.co/169Pi/Alpie-Core}
}

13. Community & Contributions

This model is released under the Apache 2.0 license, and we warmly welcome the community to build, download, and extend it.

1.Issues & Discussions: Report bugs, suggest features, or start conversations on the Hugging Face model page.

2.Contributions: Pull requests are welcome for error fixes, performance improvements, and extended functionality.

3.Fine-tuning Results: Share your experiments, benchmarks, and downstream applications with the community.

4.Collaboration: We encourage researchers, developers, and organisations to join in shaping the future of this model.

Together, we can continue to improve accessibility, safety, and performance for real-world AI applications.

14. License

Apache 2.0 License – Permissive, allowing free use, modification, and distribution for both research and commercial purposes.

15. Acknowledgements / Credits

We would like to thank DeepSeek for their original model, which served as the foundation for this work. Our team fine-tuned the model and implemented 4-bit quantization, achieving improved efficiency and accuracy for downstream tasks. This model is built with respect to the contributions of the original authors and aims to provide a safe, high-performance solution for reasoning and inference.

We are also grateful to the Hugging Face ecosystem (Transformers, PEFT, vLLM, bitsandbytes), the open-source community datasets (MMLU, GSM8K, SWE-Bench, and others), and the support of various cloud providers. Finally, we acknowledge the broader AI research community and companies whose innovations and insights continue to inspire our work.

16. Contact

For technical inquiries and support: [email protected]

Alpie Core represents a milestone for open-source AI from India, one of the first globally to show that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organisations worldwide to build more efficient, inclusive, and impactful AI.
For technical details, training methodology, and comprehensive evaluation results, please refer to our technical report.

Downloads last month: 97

Model tree for 169Pi/Alpie-Core

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Quantized

(144)

this model