Raphael Mota da Costa da Paz
RaphaPAZ
·
AI & ML interests
None yet
Recent Activity
reacted
to
samerzaher80's
post
with 👍
about 14 hours ago
AetherMind_SRL: How I beat 7B models on MMLU with 184M params and a $300 GPU
I’m Sameer, a solo researcher from Iraq working on a single RTX 3050 8GB laptop.Today I’m releasing AetherMind_SRL – a 184M-parameter NLI model that was trained only on tasks (SNLI, MNLI, ANLI, and a small clinical Alzheimer’s dataset).
It was never fine-tuned or even shown a single MMLU question during training.Yet here are the zero-shot MMLU (57 subjects) results:Model
MMLU Zero-Shot
Training Data
AetherMind_SRL (me)
184M
36.05 %
Only NLI (SNLI/MNLI/ANLI + ADNI)
DeBERTa-v3-base
278M
~30.8 %
General pre-training
BERT-large
340M
27–30 %
General pre-training
LLaMA-1 7B
7B
34–35 %
Massive text corpus
LLaMA-2 7B
7B
~45 %
Bigger + better data
Yes – my 184M model beats every classic 300–400M model and the original 7-billion-parameter LLaMA-1, all while running at 300+ samples/sec on a $300 laptop GPU.How did this happen?I built a standardized self-improvement loop called AetherMind Self-Reflective Learning (SRL) v1.0:Train normally on NLI
Let the model predict on hard adversarial data (ANLI)
Log every mistake + low-confidence case
Build a balanced “SMART” buffer (60% errors + 40% correct anchors)
Fine-tune with tiny LR and error-weighted loss
Repeat until stable
That’s it. No external knowledge, no MMLU data, no cluster.
Just pure reasoning transfer from entailment/contradiction patterns → real-world knowledge.Try it yourself python
from transformers import pipeline
import torch
nli_pipeline = pipeline(
"text-classification",
model="samerzaher80/AetherMind_SRL",
device=0 if torch.cuda.is_available() else -1
)
# DEFINE YOUR TEST HERE
premise = "Patient shows progressive memory decline."
hypothesis = "Patient shows progressive memory decline."
input_text = f"{premise} [SEP] {hypothesis}"
result = nli_pipeline(input_text)[0]
print(f"Prediction: {result['label']}")
print(f"Confidence: {result['score']:
Model: https://huggingface.co/samerzaher80/AetherMind_SRL
new activity
about 1 month ago
Qwen/Qwen3-VL-8B-Thinking:Is there any way of using this model along with qwen image on comfyui?