Nguyễn Việt Anh
thehandsomefrog4825
AI & ML interests
None yet
Organizations
None yet
Tool 🛠️
-
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
Paper • 2501.08331 • Published • 20 -
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 60 -
GameFactory: Creating New Games with Generative Interactive Videos
Paper • 2501.08325 • Published • 67 -
DiffuEraser: A Diffusion Model for Video Inpainting
Paper • 2501.10018 • Published • 16
Object detection 🔍
-
YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems
Paper • 2408.09332 • Published • 2 -
YOLOv10: Real-Time End-to-End Object Detection
Paper • 2405.14458 • Published • 6 -
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 7 -
YOLOv12: Attention-Centric Real-Time Object Detectors
Paper • 2502.12524 • Published • 12
VLM 👁️👁️
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 87 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 118 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 133
Model 🖥️
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 81 -
Phi-4 Technical Report
Paper • 2412.08905 • Published • 122 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 71
Agent 🤖
Benchmark📏
-
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 52 -
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
Paper • 2412.13018 • Published • 41 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 84 -
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 44
Reasoning 🧠
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 285 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90
TTI ⌨️➡️🖼️
-
Running on L40S266
Hunyuan3D-1.0
😻266Text-to-3D and Image-to-3D Generation
-
ROICtrl: Boosting Instance Control for Visual Generation
Paper • 2411.17949 • Published • 87 -
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
Paper • 2502.02492 • Published • 66 -
Phantom: Subject-consistent video generation via cross-modal alignment
Paper • 2502.11079 • Published • 59
TTV 📝➡️📺
-
tencent/HunyuanVideo
Text-to-Video • Updated • 1.19k • • 2.08k -
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance
Paper • 2412.02687 • Published • 113 -
STIV: Scalable Text and Image Conditioned Video Generation
Paper • 2412.07730 • Published • 74 -
Improving Video Generation with Human Feedback
Paper • 2501.13918 • Published • 52
Other research
-
o3-mini vs DeepSeek-R1: Which One is Safer?
Paper • 2501.18438 • Published • 23 -
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators
Paper • 2502.06394 • Published • 89 -
Fully Autonomous AI Agents Should Not be Developed
Paper • 2502.02649 • Published • 35 -
LM2: Large Memory Models
Paper • 2502.06049 • Published • 30
Top papers ⭐
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 285 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 157 -
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 147
LLM 🦜
-
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 157 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Are Your LLMs Capable of Stable Reasoning?
Paper • 2412.13147 • Published • 94 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
Object segmentation 🧩
Reinforce learning 🔃
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
The Differences Between Direct Alignment Algorithms are a Blur
Paper • 2502.01237 • Published • 113 -
Process Reinforcement through Implicit Rewards
Paper • 2502.01456 • Published • 61
RAG 🔄️
GAN
Robotic 🤖🔧
TTS ⌨️➡️🗣️
Generative 🎨
-
Generative World Explorer
Paper • 2411.11844 • Published • 77 -
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation
Paper • 2501.04144 • Published • 19 -
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
Paper • 2501.04689 • Published • 17 -
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Paper • 2501.01320 • Published • 12
Attention 🧐
Other research
-
o3-mini vs DeepSeek-R1: Which One is Safer?
Paper • 2501.18438 • Published • 23 -
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators
Paper • 2502.06394 • Published • 89 -
Fully Autonomous AI Agents Should Not be Developed
Paper • 2502.02649 • Published • 35 -
LM2: Large Memory Models
Paper • 2502.06049 • Published • 30
Tool 🛠️
-
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
Paper • 2501.08331 • Published • 20 -
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 60 -
GameFactory: Creating New Games with Generative Interactive Videos
Paper • 2501.08325 • Published • 67 -
DiffuEraser: A Diffusion Model for Video Inpainting
Paper • 2501.10018 • Published • 16
Top papers ⭐
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 285 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 157 -
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 147
Object detection 🔍
-
YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems
Paper • 2408.09332 • Published • 2 -
YOLOv10: Real-Time End-to-End Object Detection
Paper • 2405.14458 • Published • 6 -
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 7 -
YOLOv12: Attention-Centric Real-Time Object Detectors
Paper • 2502.12524 • Published • 12
LLM 🦜
-
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 157 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Are Your LLMs Capable of Stable Reasoning?
Paper • 2412.13147 • Published • 94 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
VLM 👁️👁️
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 87 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 118 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 133
Object segmentation 🧩
Model 🖥️
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 81 -
Phi-4 Technical Report
Paper • 2412.08905 • Published • 122 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 71
Reinforce learning 🔃
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
The Differences Between Direct Alignment Algorithms are a Blur
Paper • 2502.01237 • Published • 113 -
Process Reinforcement through Implicit Rewards
Paper • 2502.01456 • Published • 61
Agent 🤖
RAG 🔄️
Benchmark📏
-
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 52 -
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
Paper • 2412.13018 • Published • 41 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 84 -
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 44
GAN
Reasoning 🧠
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 285 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90
Robotic 🤖🔧
TTI ⌨️➡️🖼️
-
Running on L40S266
Hunyuan3D-1.0
😻266Text-to-3D and Image-to-3D Generation
-
ROICtrl: Boosting Instance Control for Visual Generation
Paper • 2411.17949 • Published • 87 -
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
Paper • 2502.02492 • Published • 66 -
Phantom: Subject-consistent video generation via cross-modal alignment
Paper • 2502.11079 • Published • 59
TTS ⌨️➡️🗣️
TTV 📝➡️📺
-
tencent/HunyuanVideo
Text-to-Video • Updated • 1.19k • • 2.08k -
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance
Paper • 2412.02687 • Published • 113 -
STIV: Scalable Text and Image Conditioned Video Generation
Paper • 2412.07730 • Published • 74 -
Improving Video Generation with Human Feedback
Paper • 2501.13918 • Published • 52
Generative 🎨
-
Generative World Explorer
Paper • 2411.11844 • Published • 77 -
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation
Paper • 2501.04144 • Published • 19 -
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
Paper • 2501.04689 • Published • 17 -
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Paper • 2501.01320 • Published • 12