Raphael Mota da Costa da Paz's picture

1 1

Raphael Mota da Costa da Paz

RaphaPAZ

·

XaszD

AI & ML interests

None yet

Recent Activity

reacted to samerzaher80's post with 👍 2 days ago

AetherMind_SRL: How I beat 7B models on MMLU with 184M params and a $300 GPU I’m Sameer, a solo researcher from Iraq working on a single RTX 3050 8GB laptop.Today I’m releasing AetherMind_SRL – a 184M-parameter NLI model that was trained only on tasks (SNLI, MNLI, ANLI, and a small clinical Alzheimer’s dataset). It was never fine-tuned or even shown a single MMLU question during training.Yet here are the zero-shot MMLU (57 subjects) results:Model MMLU Zero-Shot Training Data AetherMind_SRL (me) 184M 36.05 % Only NLI (SNLI/MNLI/ANLI + ADNI) DeBERTa-v3-base 278M ~30.8 % General pre-training BERT-large 340M 27–30 % General pre-training LLaMA-1 7B 7B 34–35 % Massive text corpus LLaMA-2 7B 7B ~45 % Bigger + better data Yes – my 184M model beats every classic 300–400M model and the original 7-billion-parameter LLaMA-1, all while running at 300+ samples/sec on a $300 laptop GPU.How did this happen?I built a standardized self-improvement loop called AetherMind Self-Reflective Learning (SRL) v1.0:Train normally on NLI Let the model predict on hard adversarial data (ANLI) Log every mistake + low-confidence case Build a balanced “SMART” buffer (60% errors + 40% correct anchors) Fine-tune with tiny LR and error-weighted loss Repeat until stable That’s it. No external knowledge, no MMLU data, no cluster. Just pure reasoning transfer from entailment/contradiction patterns → real-world knowledge.Try it yourself python from transformers import pipeline import torch nli_pipeline = pipeline( "text-classification", model="samerzaher80/AetherMind_SRL", device=0 if torch.cuda.is_available() else -1 ) # DEFINE YOUR TEST HERE premise = "Patient shows progressive memory decline." hypothesis = "Patient shows progressive memory decline." input_text = f"{premise} [SEP] {hypothesis}" result = nli_pipeline(input_text)[0] print(f"Prediction: {result['label']}") print(f"Confidence: {result['score']: Model: https://huggingface.co/samerzaher80/AetherMind_SRL

reacted to piercus's post with 👍 29 days ago

Starts erasing! 🎉 🎉 🎉 This is made with a one-step SD1.5 LBM [1] eraser ! Data is open. Data pipeline is open. Training code is open. On our LBM fork : https://github.com/finegrain-ai/LBM [1] https://huggingface.co/papers/2503.07535

new activity about 2 months ago

Qwen/Qwen3-VL-8B-Thinking:Is there any way of using this model along with qwen image on comfyui?

View all activity

Organizations

upvoted a collection about 2 months ago

Qwen3-VL

37 items • Updated 28 days ago • 456