OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

vansin submitted a paper about 9 hours ago

End-to-End Video Character Replacement without Structural Guidance

heroding77 authored a paper 1 day ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

heroding77 submitted a paper 1 day ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

View all activity

Papers

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution

View all Papers

vansin

submitted a paper to Daily Papers about 9 hours ago

End-to-End Video Character Replacement without Structural Guidance

Paper • 2601.08587 • Published about 22 hours ago • 5

heroding77

authored a paper 1 day ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published 1 day ago • 24

heroding77

submitted a paper to Daily Papers 1 day ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published 1 day ago • 24

prithivMLmods

posted an update 3 days ago

Post

3742

LTX-2 Camera-Control LoRA demo with dolly-in/out and dolly-left/right is now available on Hugging Face, paired with ltx-2-19b-distilled-lora for fast inference. It also includes dynamic GPU duration adjustments for long video generations. Click the related Space links below.

🤗Try it now on : prithivMLmods/LTX-2-LoRAs-Camera-Control-Dolly
⭐Github: https://github.com/PRITHIVSAKTHIUR/LTX-2-LoRAs-Camera-Control-Dolly
🕹️Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To learn more, visit the app page or the respective model pages.

2 replies

Nymbo

posted an update 6 days ago

Post

1642

Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return

In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.

prithivMLmods

posted an update 10 days ago

Post

2381

Dropping Image Edit (Object Manipulator): Add or remove specified objects/designs, with flexible support for both single-image and multi-image modes.

🤗 Demo: prithivMLmods/Qwen-Image-Edit-Object-Manipulator

Qwen-Image-Edit-2511-Object-Remover is an adapter (LoRA) developed for Qwen’s Qwen-Image-Edit-2511 image-to-image model. It is specifically designed for precise object removal from images.

⭐ Model: prithivMLmods/Qwen-Image-Edit-2511-Object-Remover

Qwen-Image-Edit-2511-Object-Adder is an adapter (LoRA) developed for Qwen’s Qwen-Image-Edit-2511 image-to-image model. It is specifically designed for precise object addition to images.

⭐ Model: prithivMLmods/Qwen-Image-Edit-2511-Object-Adder

🕹️ Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-object-manipulator
🕹️ github: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-Object-Manipulator

To learn more, visit the app page or the respective model pages.

kpzhang996

submitted a paper to Daily Papers 15 days ago

Yume-1.5: A Text-Controlled Interactive World Generation Model

Paper • 2512.22096 • Published 19 days ago • 58

prithivMLmods

posted an update 17 days ago

Post

4116

Update: TRELLIS.2 (Text to 3D, Image to 3D) Gradio with Rerun Embedded demo with improved visualization of the 3D model previewer is now available on Hugging Face. Generate assets and view them in the 3D viewer, powered and streamlined with Microsoft’s TRELLIS.2 and Tongyi-MAI’s Z-Image-Turbo models.

🤗 TRELLIS.2 (Demo): prithivMLmods/TRELLIS.2-Text-to-3D
🕹️ GitHub: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D-RERUN
🕹️ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 18 days ago

Post

4200

Introducing the Qwen-Image-Edit-2511-LoRAs-Fast demo, featuring image property comparison and contrast, built on top of Gradio and the combined Rerun SDK. It supports single and multi-image edits with existing LoRAs that are lazily loaded. (Note: This is still an experimental Space for Qwen-Image-Edit-2511.)

⭐ Space Demo: prithivMLmods/Qwen-Image-Edit-2511-LoRAs-Fast
⭐ GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-2511-LoRAs-Fast-Multi-Image-Rerun
⭐ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!

2 replies

lll2343

updated 3 models 18 days ago

revliter

updated a collection 21 days ago

InternVideo-Next

Collection

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision • 5 items • Updated 21 days ago • 4

prithivMLmods

posted an update 25 days ago

Post

3703

Introducing demos for new SOTA models from AI2: SAGE-MM (Smart Any-Horizon Agents for Long-Video Reasoning) and Molmo-2, an open vision-language model that supports multi-image (QA and pointing) and video (QA, pointing, and tracking). The respective demo-related collections are listed below. 🎃🔥

✨ SAGE-MM [Video-Reasoning]: prithivMLmods/SAGE-MM-Video-Reasoning
✨ Molmo2 [Demo]: prithivMLmods/Molmo2-HF-Demo

🎃 GitHub[SAGE-MM]: https://github.com/PRITHIVSAKTHIUR/SAGE-MM-Video-Reasoning
🎃 GitHub[Molmo2]: https://github.com/PRITHIVSAKTHIUR/Molmo2-HF-Demo
🎃 Multimodal Implementations: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!

1 reply

vansin

posted an update 25 days ago

Post

263

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

prithivMLmods

posted an update 26 days ago

Post

2065

Introducing TRELLIS.2 Text-to-3D. The demo for the TRELLIS.2-4B (Image-to-3D) model is streamlined with the Z-Image Turbo image generation model to enable Text-to-3D functionality. There is no need for input assets, making a small leap forward for ideation. Optionally, it also includes default support for Image-to-3D inference using direct image assets. Find the demo and related collections below... 🤗🔥

✨ TRELLIS.2-Text-to-3D [Demo]: prithivMLmods/TRELLIS.2-Text-to-3D
✨ Multimodal Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Github: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D

To know more about it, visit the app page or the respective model page!

Eurayka

authored a paper 26 days ago

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

Paper • 2511.20272 • Published Nov 25, 2025 • 1

Nymbo

posted an update 26 days ago

Post

2074

🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets

I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot

prithivMLmods

posted an update 28 days ago

Post

2025

Demo for Molmo2 on Hugging Face is live now, including Single/Multi-Image VQA, Visual Pointing/Grounding, Video VQA, and Video Point Tracking. Find the demo and related collections below. 🔥🤗

● Molmo2 HF Demo🖥️: prithivMLmods/Molmo2-HF-Demo
● Model Collection: https://huggingface.co/collections/allenai/molmo2
● Related Multimodal Space Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 29 days ago

Post

5551

Introducing the Z Image Turbo LoRA DLC App, a gallery space for plug-and-play Z-Image-Turbo LoRAs. It features a curated collection of impressive LoRAs for generating high-quality images. By default, it runs on the base model. Simply choose a LoRA, type your prompt, and generate images. You can find the app and more details below. 🤗🧪

● Space [Demo]: prithivMLmods/Z-Image-Turbo-LoRA-DLC
● Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
● Check the list of Z-Image LoRA's: https://huggingface.co/models?other=base_model:adapter:Tongyi-MAI/Z-Image-Turbo
● Github: https://github.com/PRITHIVSAKTHIUR/Z-Image-Turbo-LoRA-DLC

Other related image gen spaces:-

● FLUX-LoRA-DLC2: prithivMLmods/FLUX-LoRA-DLC2
● FLUX-LoRA-DLC: prithivMLmods/FLUX-LoRA-DLC
● Qwen-Image-LoRA-DLC: prithivMLmods/Qwen-Image-LoRA-DLC
● Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
● Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

& more...

To know more about it, visit the app page or the respective model page!

2 replies

AI & ML interests

Recent Activity

Papers

Team members 117

OpenGVLab's activity