Jan-v2-VL Collection Jan-v2-VL: an 8B VLM focused on reliable, many-step task execution. • 6 items • Updated 2 days ago • 25
Waqfeya Library Collection Waqfeya is one of the primary online resources for Islamic books, similar to Shamela. It hosts more than 10,000 PDF books across over 80 categories. • 3 items • Updated Apr 23 • 2
view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) Jan 19 • 35
Pre-training Dataset Samples Collection A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. • 19 items • Updated 4 days ago • 13
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published Oct 6 • 120
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR 23 days ago • 58
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report Paper • 2510.14880 • Published 30 days ago • 15
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published Oct 8 • 30
Code2Video: A Code-centric Paradigm for Educational Video Generation Paper • 2510.01174 • Published Oct 1 • 33
Journal Club Collection Candidate papers to read in the H4 journal club • 54 items • Updated Apr 21, 2024 • 35
⚛️ Liquid Nanos Collection Library of task-specific models: https://www.liquid.ai/blog/introducing-liquid-nanos-frontier-grade-performance-on-everyday-devices • 21 items • Updated 16 days ago • 92
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR Paper • 2509.18174 • Published Sep 17 • 124
EpiCache: Episodic KV Cache Management for Long Conversational Question Answering Paper • 2509.17396 • Published Sep 22 • 19