-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 471k • • 12.8k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
open-r1/OpenR1-Math-220k
Viewer • Updated • 450k • 7.61k • 658
Collections
Discover the best community collections!
Collections including paper arxiv:2402.17764
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 84 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 69 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 105
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
CohereLabs/c4ai-command-r-plus-08-2024
Text Generation • 104B • Updated • 2.87k • 276 -
meta-llama/Meta-Llama-3-8B
Text Generation • 8B • Updated • 1.5M • • 6.35k -
meta-llama/Meta-Llama-3-70B
Text Generation • 71B • Updated • 13.2k • • 869 -
impira/layoutlm-document-qa
Document Question Answering • 0.1B • Updated • 33.7k • 1.15k
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 471k • • 12.8k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
open-r1/OpenR1-Math-220k
Viewer • Updated • 450k • 7.61k • 658
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 84 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 69 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 105
-
CohereLabs/c4ai-command-r-plus-08-2024
Text Generation • 104B • Updated • 2.87k • 276 -
meta-llama/Meta-Llama-3-8B
Text Generation • 8B • Updated • 1.5M • • 6.35k -
meta-llama/Meta-Llama-3-70B
Text Generation • 71B • Updated • 13.2k • • 869 -
impira/layoutlm-document-qa
Document Question Answering • 0.1B • Updated • 33.7k • 1.15k