YannQi's picture

6 11 9

YannQi

YannQi

·

https://yannqi.github.io/

yannqi

AI & ML interests

Computer vision, AGI, Multi-modality.

Recent Activity

authored a paper 5 days ago

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

authored a paper 5 days ago

Taming Modality Entanglement in Continual Audio-Visual Segmentation

authored a paper 5 days ago

HunyuanOCR Technical Report

View all activity

Organizations

upvoted 3 papers 5 days ago

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

Paper • 2510.14605 • Published Oct 16 • 4

Taming Modality Entanglement in Continual Audio-Visual Segmentation

Paper • 2510.17234 • Published Oct 20 • 4

HunyuanOCR Technical Report

Paper • 2511.19575 • Published 9 days ago • 19

upvoted a collection 3 months ago

Qwen3

84 items • Updated Aug 6 • 1.46k

upvoted a paper 3 months ago

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28 • 109

upvoted a paper 6 months ago

Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger

Paper • 2506.07785 • Published Jun 9 • 1

upvoted 2 papers 7 months ago

Thinkless: LLM Learns When to Think

Paper • 2505.13379 • Published May 19 • 50

AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82

upvoted a paper 8 months ago

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Paper • 2504.00595 • Published Apr 1 • 37

upvoted a collection about 1 year ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Jul 21 • 664

upvoted a paper about 1 year ago

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Paper • 2409.06135 • Published Sep 10, 2024 • 16