Sayambhu Sen
Testerpce
AI & ML interests
None yet
Recent Activity
updated
a collection
about 4 hours ago
Attention
updated
a collection
1 day ago
Agent
updated
a collection
1 day ago
Video understanding
Organizations
Eval
-
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper • 2508.15804 • Published • 15 -
Behavioral Fingerprinting of Large Language Models
Paper • 2509.04504 • Published • 5 -
Statistical Methods in Generative AI
Paper • 2509.07054 • Published • 11 -
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Paper • 2510.01591 • Published • 26
LLM judge
3D
Materials and structures
Vision
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 75 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 72 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 35
Code
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 9 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 11 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
Data
-
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 52 -
Data Efficacy for Language Model Training
Paper • 2506.21545 • Published • 11 -
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Paper • 2507.04009 • Published • 49 -
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs
Paper • 2507.03253 • Published • 18
Memory
-
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team
Paper • 2506.14234 • Published • 41 -
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models
Paper • 2506.14435 • Published • 7 -
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Paper • 2504.19413 • Published • 24 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 153
Applications and Uses
-
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 53 -
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Paper • 2506.06444 • Published • 73 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 71 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 4
Adversarial
Multimodal
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
Interpretable
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
Large Language Models are Locally Linear Mappings
Paper • 2505.24293 • Published • 14 -
Thought Anchors: Which LLM Reasoning Steps Matter?
Paper • 2506.19143 • Published • 13
Diffusion
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 73 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
Information_retrieval
-
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper • 2502.18418 • Published • 28 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Paper • 2505.16967 • Published • 24 -
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension
Paper • 2508.01959 • Published • 56
Attention
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Lizard: An Efficient Linearization Framework for Large Language Models
Paper • 2507.09025 • Published • 18 -
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Paper • 2507.23632 • Published • 6 -
Causal Attention with Lookahead Keys
Paper • 2509.07301 • Published • 21
Agent
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
RAG
-
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10 -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Paper • 2412.12881 • Published • 2 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 28
Prompt papers
-
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 53 -
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
Paper • 2506.17930 • Published • 19 -
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper • 2509.21880 • Published • 51
Sparsity
State space LLM
Reasoning
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 55 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25
Fine tuning
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 63 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54
Dataset and Data processing
-
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 24 -
RedPajama: an Open Dataset for Training Large Language Models
Paper • 2411.12372 • Published • 56 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45
Video understanding
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 13
Long context
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Paper • 2410.10819 • Published • 8 -
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Paper • 2410.09342 • Published • 39 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53
Tool
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 11 -
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 83 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15 -
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16
Foundation Models
Test time
Physics and operators
Vision Language Action models
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 38 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 34 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 43 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 31
World model
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 39 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 10 -
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Paper • 2510.00406 • Published • 63 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 43
Compression
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Paper • 2506.19697 • Published • 44 -
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Paper • 2509.23873 • Published • 67 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 76
Process Reward Modelling
-
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Paper • 2506.18896 • Published • 29 -
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Paper • 2505.15277 • Published • 104 -
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Paper • 2501.03124 • Published • 14 -
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
Paper • 2505.11227 • Published
SAE
-
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 21 -
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 40 -
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Paper • 2505.22255 • Published • 24 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119
Theory and Representation learning
Graph
-
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes
Paper • 2504.11544 • Published • 43 -
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
Paper • 2307.09793 • Published • 46 -
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Paper • 2504.12764 • Published • 41 -
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Paper • 2505.16901 • Published • 47
Search
-
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper • 2503.20201 • Published • 48 -
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Paper • 2503.19470 • Published • 19 -
Spacer: Towards Engineered Scientific Inspiration
Paper • 2508.17661 • Published • 32 -
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks
Paper • 2509.01396 • Published • 56
Diversity
Self correction
Speech
-
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 69 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 72 -
Audio-Aware Large Language Models as Judges for Speaking Styles
Paper • 2506.05984 • Published • 15 -
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Paper • 2506.16310 • Published • 25
Synthetic data
-
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Smaller Language Models Are Better Instruction Evolvers
Paper • 2412.11231 • Published • 28 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 52 -
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 68
MoE
-
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper • 2410.10814 • Published • 51 -
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Paper • 2502.16894 • Published • 32 -
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Paper • 2506.14731 • Published • 8 -
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Paper • 2506.18349 • Published • 13
Markov chain
Planning
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 11 -
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Paper • 2405.09220 • Published • 28 -
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Paper • 2504.15785 • Published • 20 -
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
Paper • 2508.20096 • Published • 36
Multilingual
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
Paper • 2401.05811 • Published • 8 -
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Paper • 2409.20059 • Published • 17 -
Are Character-level Translations Worth the Wait? Comparing Character- and Subword-level Models for Machine Translation
Paper • 2302.14220 • Published
Partial layer training LLMs
-
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper • 2309.08968 • Published • 23 -
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
Paper • 2505.20355 • Published • 35 -
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 43 -
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
Paper • 2509.15591 • Published • 45
Evaluation
Math
-
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
Solving Inequality Proofs with Large Language Models
Paper • 2506.07927 • Published • 20 -
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Paper • 2507.00432 • Published • 79 -
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Paper • 2507.06181 • Published • 43
Style transfer
Reinforcement learning
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
Knowledge
-
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models
Paper • 2408.15915 • Published • 19 -
ReLearn: Unlearning via Learning for Large Language Models
Paper • 2502.11190 • Published • 30 -
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Paper • 2503.21729 • Published • 29 -
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Paper • 2504.00509 • Published • 22
Tabular ML
Tool
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 11 -
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 83 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15 -
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16
Eval
-
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper • 2508.15804 • Published • 15 -
Behavioral Fingerprinting of Large Language Models
Paper • 2509.04504 • Published • 5 -
Statistical Methods in Generative AI
Paper • 2509.07054 • Published • 11 -
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Paper • 2510.01591 • Published • 26
Foundation Models
LLM judge
Test time
3D
Physics and operators
Materials and structures
Vision Language Action models
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 38 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 34 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 43 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 31
Vision
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 75 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 72 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 35
World model
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 39 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 10 -
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Paper • 2510.00406 • Published • 63 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 43
Code
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 9 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 11 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
Compression
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Paper • 2506.19697 • Published • 44 -
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Paper • 2509.23873 • Published • 67 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 76
Data
-
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 52 -
Data Efficacy for Language Model Training
Paper • 2506.21545 • Published • 11 -
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Paper • 2507.04009 • Published • 49 -
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs
Paper • 2507.03253 • Published • 18
Process Reward Modelling
-
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Paper • 2506.18896 • Published • 29 -
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Paper • 2505.15277 • Published • 104 -
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Paper • 2501.03124 • Published • 14 -
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
Paper • 2505.11227 • Published
Memory
-
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team
Paper • 2506.14234 • Published • 41 -
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models
Paper • 2506.14435 • Published • 7 -
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Paper • 2504.19413 • Published • 24 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 153
SAE
-
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 21 -
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 40 -
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Paper • 2505.22255 • Published • 24 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119
Applications and Uses
-
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 53 -
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Paper • 2506.06444 • Published • 73 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 71 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 4
Theory and Representation learning
Adversarial
Graph
-
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes
Paper • 2504.11544 • Published • 43 -
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
Paper • 2307.09793 • Published • 46 -
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Paper • 2504.12764 • Published • 41 -
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Paper • 2505.16901 • Published • 47
Multimodal
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
Search
-
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper • 2503.20201 • Published • 48 -
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Paper • 2503.19470 • Published • 19 -
Spacer: Towards Engineered Scientific Inspiration
Paper • 2508.17661 • Published • 32 -
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks
Paper • 2509.01396 • Published • 56
Interpretable
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
Large Language Models are Locally Linear Mappings
Paper • 2505.24293 • Published • 14 -
Thought Anchors: Which LLM Reasoning Steps Matter?
Paper • 2506.19143 • Published • 13
Diversity
Diffusion
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 73 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
Self correction
Information_retrieval
-
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper • 2502.18418 • Published • 28 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Paper • 2505.16967 • Published • 24 -
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension
Paper • 2508.01959 • Published • 56
Speech
-
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 69 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 72 -
Audio-Aware Large Language Models as Judges for Speaking Styles
Paper • 2506.05984 • Published • 15 -
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Paper • 2506.16310 • Published • 25
Attention
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Lizard: An Efficient Linearization Framework for Large Language Models
Paper • 2507.09025 • Published • 18 -
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Paper • 2507.23632 • Published • 6 -
Causal Attention with Lookahead Keys
Paper • 2509.07301 • Published • 21
Synthetic data
-
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Smaller Language Models Are Better Instruction Evolvers
Paper • 2412.11231 • Published • 28 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 52 -
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 68
Agent
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
MoE
-
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper • 2410.10814 • Published • 51 -
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Paper • 2502.16894 • Published • 32 -
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Paper • 2506.14731 • Published • 8 -
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Paper • 2506.18349 • Published • 13
RAG
-
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10 -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Paper • 2412.12881 • Published • 2 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 28
Markov chain
Prompt papers
-
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 53 -
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
Paper • 2506.17930 • Published • 19 -
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper • 2509.21880 • Published • 51
Planning
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 11 -
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Paper • 2405.09220 • Published • 28 -
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Paper • 2504.15785 • Published • 20 -
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
Paper • 2508.20096 • Published • 36
Sparsity
Multilingual
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
Paper • 2401.05811 • Published • 8 -
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Paper • 2409.20059 • Published • 17 -
Are Character-level Translations Worth the Wait? Comparing Character- and Subword-level Models for Machine Translation
Paper • 2302.14220 • Published
State space LLM
Partial layer training LLMs
-
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper • 2309.08968 • Published • 23 -
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
Paper • 2505.20355 • Published • 35 -
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 43 -
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
Paper • 2509.15591 • Published • 45
Reasoning
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 55 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25
Evaluation
Fine tuning
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 63 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54
Math
-
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
Solving Inequality Proofs with Large Language Models
Paper • 2506.07927 • Published • 20 -
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Paper • 2507.00432 • Published • 79 -
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Paper • 2507.06181 • Published • 43
Dataset and Data processing
-
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 24 -
RedPajama: an Open Dataset for Training Large Language Models
Paper • 2411.12372 • Published • 56 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45
Style transfer
Video understanding
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 13
Reinforcement learning
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
Long context
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Paper • 2410.10819 • Published • 8 -
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Paper • 2410.09342 • Published • 39 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53
Knowledge
-
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models
Paper • 2408.15915 • Published • 19 -
ReLearn: Unlearning via Learning for Large Language Models
Paper • 2502.11190 • Published • 30 -
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Paper • 2503.21729 • Published • 29 -
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Paper • 2504.00509 • Published • 22