Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Augusteinia 's Collections
Paradigm
Math
VLM
3DV
RL thinking

Paradigm

updated Jun 26
Upvote
-

  • Parallel Scaling Law for Language Models

    Paper • 2505.10475 • Published May 15 • 83

  • Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

    Paper • 2505.15045 • Published May 21 • 54

  • Scaling Diffusion Transformers Efficiently via μP

    Paper • 2505.15270 • Published May 21 • 35

  • Vision Transformers Don't Need Trained Registers

    Paper • 2506.08010 • Published Jun 9 • 21

  • MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    Paper • 2506.13585 • Published Jun 16 • 267

  • Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression

    Paper • 2506.09482 • Published Jun 11 • 45
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs