Open to Work

7 116 261

Muhammad Umair

umair894

AI & ML interests

Multimodal Reidentification | Feature Upscaling | Object Tracking |PhD UESTC

Recent Activity

new activity 26 minutes ago

facebook/sam3:Access to this repo has been rejected

liked a Space 2 days ago

merve/SAM3-video-segmentation

upvoted a paper 3 days ago

MedSAM3: Delving into Segment Anything with Medical Concepts

View all activity

Organizations

New activity in facebook/sam3 26 minutes ago

Access to this repo has been rejected

#27 opened 26 minutes ago by

umair894

liked a Space 2 days ago

SAM3 Video Segmentation

🐠

Track and label objects in videos using text prompts or clicks

upvoted a paper 3 days ago

MedSAM3: Delving into Segment Anything with Medical Concepts

Paper • 2511.19046 • Published 5 days ago • 46

liked a Space 4 days ago

CUA - Computer Use Agent 2.0

🤖

Generate captions for images

upvoted 2 papers 5 days ago

Insights from the ICLR Peer Review and Rebuttal Process

Paper • 2511.15462 • Published 10 days ago • 6

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 9 days ago • 88

upvoted a paper 8 days ago

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published 9 days ago • 97

liked a Space 9 days ago

EdgeTAM

🚀

On-Device Track Anything Model

upvoted a paper 11 days ago

Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published 16 days ago • 88

liked a Space 19 days ago

Miragic Virtual Try On

👕

483

Try on complete outfits with our virtual try-on technology

upvoted a paper 19 days ago

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16 • 98

reacted to prithivMLmods's post with 🔥 19 days ago

Post

2857

Introducing Photo-Mate-v2, based on FLUX.1-Kontext-dev, for advanced image manipulation tasks. It supports transforming scenes into top-down/bottom-up perspectives, CAM-right/left-view and its reverse, as well as general kontext-specified object removal. Below is the list of demos and adapters.🔥🤗

➤ Spaces [Demo] : prithivMLmods/Kontext-Photo-Mate-v2

Kontext-Adapters :
✦ Kontext-Bottom-Up-View: prithivMLmods/Kontext-Bottom-Up-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Top-Down-View: prithivMLmods/Kontext-Top-Down-View
✦ Kontext-CAM-Left-View: prithivMLmods/Kontext-CAM-Left-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Unblur-Upscale: prithivMLmods/Kontext-Unblur-Upscale
✦ Kontext-0811-exp: prithivMLmods/Kontext-0811-exp

Photo-Mate Collection:
✦ Kontext CAM Angles: https://huggingface.co/collections/prithivMLmods/kontext-cam-angles
✦ i2i - Kontext (exp): https://huggingface.co/collections/prithivMLmods/i2i-kontext-exp
✦ LZO-1 (Lossless Zoom Operator): https://huggingface.co/collections/prithivMLmods/lzo-1-lossless-zoom-operator

Related-Apps:
✦ Photo-Mate [Version 1.0]: prithivMLmods/Photo-Mate-i2i
✦ Image Generation Apps [Collection]: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
@prithivMLmods

upvoted a paper 27 days ago

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 120

reacted to prithivMLmods's post with 🤗 27 days ago

Post

2580

A small blog post titled - Hall of Multimodal OCR VLMs and Demonstrations has been published on ↗️ https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms on behalf of

strangervisionhf

It discusses the latest trends in OCR models, the multilingual support offered by modern OCR systems, their unique capabilities, OCR benchmark model comparisons, transformer-based implementations, and strategies for streamlining transformers compatibility.

reacted to Norod78's post with 🔥 27 days ago

Post

1641

Multilingual Tokenization Showdown
Analyzing 12 LLM Tokenizers Across 204 Languages.

First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages:
Norod78/WikiCat-Multilingual

For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language).

You can see a slideshow summary of the results here:
https://norod.github.io/wikicat-tokenizer-eval/tokenizer-slideshow.html

I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo:
https://github.com/Norod/wikicat-tokenizer-eval

Post on X:
https://x.com/Norod78/status/1984366900550266999

reacted to Shivansh000's post with 🔥 27 days ago

Post

1981

I am dedicating this weekend to practicing/reading the latest b(ook)log from hugging face. It is meant to be a guide for anyone trying to go from “we have a great dataset and GPUs” to “we built a really strong model.” Will share thoughts upon completion.

Thanks for the treat @eliebak @ThomasWolf and HF team!

HuggingFaceTB/smol-training-playbook

liked a Space 27 days ago

The Smol Training Playbook

📚

2.47k

The secrets to building world-class LLMs

reacted to DmitryRyumin's post with 🔥 about 1 month ago

Post

3939

🚀💡🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🪄🚀
📄 Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models 🔝

📝 Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

👥 Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

🌐 Github Page: https://andrehuang.github.io/loftup-site
📁 Repository: https://github.com/andrehuang/loftup

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight