Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Nguyễn Việt Anh's picture

14

Nguyễn Việt Anh

thehandsomefrog4825

Gracedream's profile picture

Amilavim's profile picture

·

AI & ML interests

None yet

Organizations

None yet

thehandsomefrog4825 's collections 20

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Paper • 2501.08331 • Published Jan 14 • 20
MangaNinja: Line Art Colorization with Precise Reference Following

Paper • 2501.08332 • Published Jan 14 • 60
GameFactory: Creating New Games with Generative Interactive Videos

Paper • 2501.08325 • Published Jan 14 • 67
DiffuEraser: A Diffusion Model for Video Inpainting

Paper • 2501.10018 • Published Jan 17 • 16

Object detection 🔍

YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems

Paper • 2408.09332 • Published Aug 18, 2024 • 2
YOLOv10: Real-Time End-to-End Object Detection

Paper • 2405.14458 • Published May 23, 2024 • 6
End-to-End Object Detection with Transformers

Paper • 2005.12872 • Published May 26, 2020 • 7
YOLOv12: Attention-Centric Real-Time Object Detectors

Paper • 2502.12524 • Published Feb 18 • 12

VLM 👁️👁️

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 87
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 118
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 133

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 81
Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 122
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 71

Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8 • 95
UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published Jan 21 • 65
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22 • 69

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2 • 52
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Paper • 2412.13018 • Published Dec 17, 2024 • 41
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published Jan 6 • 44

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 285
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 90

TTI ⌨️➡️🖼️

Running on L40S

266

Hunyuan3D-1.0

😻

266

Text-to-3D and Image-to-3D Generation
ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 87
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published Feb 4 • 66
Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published Feb 16 • 59

TTV 📝➡️📺

tencent/HunyuanVideo

Text-to-Video • Updated Mar 6 • 1.19k • • 2.08k
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Paper • 2412.02687 • Published Dec 3, 2024 • 113
STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published Dec 10, 2024 • 74
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

o3-mini vs DeepSeek-R1: Which One is Safer?

Paper • 2501.18438 • Published Jan 30 • 23
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators

Paper • 2502.06394 • Published Feb 10 • 89
Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published Feb 4 • 35
LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 285
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 157
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 157
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 94
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108

Object segmentation 🧩

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 117

Reinforce learning 🔃

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 72
The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published Feb 3 • 113
Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 61

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10 • 75

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 95

Robotic 🤖🔧

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Paper • 2501.03841 • Published Jan 7 • 56
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published Jan 3 • 55

TTS ⌨️➡️🗣️

Runtime error

120

viXTTS Demo

🚀

120

Generate audio from text using a reference voice

Generative 🎨

Generative World Explorer

Paper • 2411.11844 • Published Nov 18, 2024 • 77
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Paper • 2501.04144 • Published Jan 7 • 19
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Paper • 2501.04689 • Published Jan 8 • 17
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Paper • 2501.01320 • Published Jan 2 • 12

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

o3-mini vs DeepSeek-R1: Which One is Safer?

Paper • 2501.18438 • Published Jan 30 • 23
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators

Paper • 2502.06394 • Published Feb 10 • 89
Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published Feb 4 • 35
LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Paper • 2501.08331 • Published Jan 14 • 20
MangaNinja: Line Art Colorization with Precise Reference Following

Paper • 2501.08332 • Published Jan 14 • 60
GameFactory: Creating New Games with Generative Interactive Videos

Paper • 2501.08325 • Published Jan 14 • 67
DiffuEraser: A Diffusion Model for Video Inpainting

Paper • 2501.10018 • Published Jan 17 • 16

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 285
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 157
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

Object detection 🔍

YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems

Paper • 2408.09332 • Published Aug 18, 2024 • 2
YOLOv10: Real-Time End-to-End Object Detection

Paper • 2405.14458 • Published May 23, 2024 • 6
End-to-End Object Detection with Transformers

Paper • 2005.12872 • Published May 26, 2020 • 7
YOLOv12: Attention-Centric Real-Time Object Detectors

Paper • 2502.12524 • Published Feb 18 • 12

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 157
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 94
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108

VLM 👁️👁️

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 87
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 118
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 133

Object segmentation 🧩

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 117

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 81
Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 122
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 71

Reinforce learning 🔃

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 72
The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published Feb 3 • 113
Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 61

Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8 • 95
UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published Jan 21 • 65
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22 • 69

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10 • 75

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2 • 52
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Paper • 2412.13018 • Published Dec 17, 2024 • 41
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published Jan 6 • 44

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 95

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 285
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 90

Robotic 🤖🔧

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Paper • 2501.03841 • Published Jan 7 • 56
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published Jan 3 • 55

TTI ⌨️➡️🖼️

Running on L40S

266

Hunyuan3D-1.0

😻

266

Text-to-3D and Image-to-3D Generation
ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 87
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published Feb 4 • 66
Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published Feb 16 • 59

TTS ⌨️➡️🗣️

Runtime error

120

viXTTS Demo

🚀

120

Generate audio from text using a reference voice

TTV 📝➡️📺

tencent/HunyuanVideo

Text-to-Video • Updated Mar 6 • 1.19k • • 2.08k
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Paper • 2412.02687 • Published Dec 3, 2024 • 113
STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published Dec 10, 2024 • 74
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

Generative 🎨

Generative World Explorer

Paper • 2411.11844 • Published Nov 18, 2024 • 77
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Paper • 2501.04144 • Published Jan 7 • 19
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Paper • 2501.04689 • Published Jan 8 • 17
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Paper • 2501.01320 • Published Jan 2 • 12

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs