data-is-better-together-contributor (Data Is Better Together Contributor)

posted an update 1 day ago

Post

192

Try the demo of NVIDIA Nemotron Parse v1.1, NVIDIA's latest VLM for understanding document semantics and extracting text and table elements with spatial grounding. It is capable of comprehensive text understanding and document structure analysis in a given document, and can provide bounding boxes with coordinates.

⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-v1.1
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Some relevant Spaces

⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3

Check out the other spaces in the multimodal implementation collection.

To know more about it, visit the app page or the respective model page!

ZennyKenny

posted an update 3 days ago

Post

354

The #feedback channel of app early access Slack Workspaces is some of the best unintentional comedy material I have ever come across tbh.

prithivMLmods

posted an update 4 days ago

Post

1406

Try the all-new trending Qwen-Image-Edit-2509 (Multi-Image-Edits) specialized adapter demos, including Cloth-Design-Fuse, Texture Edit, Guided-Objects-Patching, and more — all in a single Hugging Face Space. The demo link is provided below. 🤗🔥

⮞ Space[Demo]: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⮞ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
⮞ Base Model: Qwen/Qwen-Image-Edit-2509

Similar applications↗️

⮞ Kontext-Photo-Mate-v2: prithivMLmods/Kontext-Photo-Mate-v2
⮞ Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⮞ Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 5 days ago

Post

3418

Made a demo for multimodal understanding of Qwen3-VL space for tasks including point annotation, detection, captioning, guided text inferences, and more. Find the demo link below. 🤗↗️

⮞ Space[Demo]: prithivMLmods/Qwen3-VL-HF-Demo
⮞ Model Used: Qwen/Qwen3-VL-4B-Instruct
⮞ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
⮞ GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-3VL-Multimodal-Understanding

To know more about it, visit the app page or the respective model page!

ZennyKenny

posted an update 5 days ago

Post

3097

🎉 Wow. Congratulations @bfirsh and the Replicate team on the CloudFlare acquisition!

✌️ You've really built an incredible ecosystem and product offering and should be super proud.

flozi00

posted an update 6 days ago

Post

2699

Running large language models efficiently is more than just raw GPU power. The latest guide breaks down the essential math to determine if your LLM workload is compute-bound or memory-bound.

We apply these principles to a real-world example: Qwen's 32B parameter model on the new NVIDIA RTX PRO 6000 Blackwell Edition.

In this guide, you will learn how to:

Calculate your GPU's operational intensity (Ops:Byte Ratio)
Determine your model's arithmetic intensity
Identify whether your workload is memory-bound or compute-bound

Read the full guide here: https://flozi.net/en/guides/ai/llm-inference-math

prithivMLmods

posted an update 7 days ago

Post

3665

Made a small write up and experimental finetuning guide for MetaCLIP2 for Image Classification on Downstream Tasks. The blog titled Fine Tuning MetaCLIP 2 for Image Classification on Downstream Tasks demonstrates the step by step finetuning using CIFAR10 and is also flexible for adapting to other datasets. For more details, check out the linked blog below. 🤗↗️

⮞ Blog Article: https://huggingface.co/blog/prithivMLmods/metaclip2-downstream-finetune
⮞ Demo Space[Zero-Shot Classification]: prithivMLmods/metaclip-2-demo

Some other models
╰› MetaCLIP-2-Cifar10: prithivMLmods/MetaCLIP-2-Cifar10
╰› MetaCLIP-2-Age-Range-Estimator: prithivMLmods/MetaCLIP-2-Age-Range-Estimator
╰› MetaCLIP-2-Gender-Identifier: prithivMLmods/MetaCLIP-2-Gender-Identifier
╰› MetaCLIP-2-Open-Scene: prithivMLmods/MetaCLIP-2-Open-Scene

⮞ Collection: https://huggingface.co/collections/prithivMLmods/metaclip2-image-classification-experiments

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 11 days ago

Post

3229

Try the all-new trending Qwen-Image-Edit specialized adapter demos, including Photo-to-Anime, Light Restoration, Multi-Angle Edits, Relighting, and more — all in a single Hugging Face Space. Below is the demo link. 🤗🌠

⮞ Demo-Space: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
⮞ How-to-Use: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast#2
⮞ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!

3 replies

·

flozi00

posted an update 11 days ago

Post

263

Struggling with NVIDIA drivers on Ubuntu 24.04?
Can't use your GPUs with CUDA installed, or only half of them work?
Black screen after startup or nvidia-smi fails?

The nokaslr boot option might be the cause—and the solution.
Find out why disabling KASLR can fix these GPU issues until a permanent driver update is available.

https://flozi.net/en/guides/linux/solving-nvidia-driver-issues-on-ubuntu-24-04-with-nokaslr

Locutusque

posted an update 13 days ago

Post

2591

🚀 AutoXLA - Accelerating Large Models on TPU
AutoXLA is an experimental library that automates the distribution, optimization, and quantization of large language models for TPUs using PyTorch/XLA. It extends the Hugging Face Transformers interface with TPU-aware features such as automatic sharding, custom attention kernels, and quantization-aware loading, making large-scale deployment and training both simpler and faster.
With quantization and Splash Attention kernels, AutoXLA achieves up to 4× speedups over standard Flash Attention implementations, significantly improving throughput for both inference and training workloads.
Whether you’re experimenting with distributed setups (FSDP, 2D, or 3D sharding) or optimizing memory via LanguageModelQuantizer, AutoXLA is built to make scaling LLMs on TPU seamless.
⚠️ Note: This is an experimental repository. Expect rough edges! Please report bugs or unexpected behavior through GitHub issues.
🔗 GitHub Repository: https://github.com/Locutusque/AutoXLA

ZennyKenny

posted an update 14 days ago

Post

315

🎉 Novoyaz is live.

A few months ago, I built a quick POC in Hugging Face that used a fine-tuned variant of OpenAI's OSS-20B model that I trained to convert the text from pre-reform Russian-language documents into modern Russian orthography.

⚡️ This morning, I launched novoyaz.io.

This is a production app, the frontend for which I built in like two hours with Lovable, that uses that same fine-tuned model for transliteration, but now has a bunch of extra features that make using it even easier (like taking and uploading pictures with your on-device camera for example 😅).

👉 If you're a researcher, or know a researcher, for whom this app will improve their day-to-day workflows, please get in touch with me.

prithivMLmods

posted an update 15 days ago

Post

2843

Introducing Photo-Mate-v2, based on FLUX.1-Kontext-dev, for advanced image manipulation tasks. It supports transforming scenes into top-down/bottom-up perspectives, CAM-right/left-view and its reverse, as well as general kontext-specified object removal. Below is the list of demos and adapters.🔥🤗

➤ Spaces [Demo] : prithivMLmods/Kontext-Photo-Mate-v2

Kontext-Adapters :
✦ Kontext-Bottom-Up-View: prithivMLmods/Kontext-Bottom-Up-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Top-Down-View: prithivMLmods/Kontext-Top-Down-View
✦ Kontext-CAM-Left-View: prithivMLmods/Kontext-CAM-Left-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Unblur-Upscale: prithivMLmods/Kontext-Unblur-Upscale
✦ Kontext-0811-exp: prithivMLmods/Kontext-0811-exp

Photo-Mate Collection:
✦ Kontext CAM Angles: https://huggingface.co/collections/prithivMLmods/kontext-cam-angles
✦ i2i - Kontext (exp): https://huggingface.co/collections/prithivMLmods/i2i-kontext-exp
✦ LZO-1 (Lossless Zoom Operator): https://huggingface.co/collections/prithivMLmods/lzo-1-lossless-zoom-operator

Related-Apps:
✦ Photo-Mate [Version 1.0]: prithivMLmods/Photo-Mate-i2i
✦ Image Generation Apps [Collection]: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
@prithivMLmods

SivilTaram

authored a paper 16 days ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published 18 days ago • 116

flozi00

posted an update 17 days ago

Post

1941

I just got asked about the differences between Blackwell systems and Grace Blackwell systems. What's the difference and how much of a performance gap is there between them?

https://flozi.net/en/hardware/nvidia/benchmarks/b200-vs-gb200-efficiency-comparison

Here's a summary of the key points from the article:

GB200 (Grace Blackwell) is a Superchip: It integrates a Grace CPU and two Blackwell GPUs into a single package.
B200 is a GPU-only module: It's designed to be paired with x86 or ARM CPUs in more traditional server setups.

Performance and Efficiency:

Based on MLPerf Training v5.0 benchmarks, the article concludes:

GB200 systems are approximately 42% more efficient than B200 systems on average. This is especially true in large-scale deployments (100+ GPUs), where the GB200's integrated design and high-speed NVLink interconnect provide a significant advantage.

In smaller, single-node systems (e.g., 8 GPUs), the performance difference is much smaller, around 10-15%.

Use Cases:

Choose GB200 for large-scale AI clusters, training massive models, and when maximum efficiency is the top priority.

Choose B200 for smaller deployments, when you need the flexibility to choose your own CPU, or for mixed AI and HPC workloads.

prithivMLmods

posted an update 19 days ago

Post

1274

A week ago, I shared a post about the latest transformers test implementation of DeepSeek-OCR Compatibility (https://tinyurl.com/ykc4mm66). Now, I’m dropping the most compatible version of it to support the model with the latest transformers. 🤗🔥

➠ DeepSeek-OCR-Latest-BF16.I64: prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
➠ DeepSeek OCR [exp] : prithivMLmods/DeepSeek-OCR-experimental

✅Supports the latest transformers v4.57.1
✅torch: 2.6.0+cu124 (or) the latest version (i.e., torch 2.9.0)
✅cuda version: 12.4
✅users can also opt out of specific attention implementations if desired.

✨Previous version: strangervisionhf/deepseek-ocr-latest-transformers
↗️Related Blog: https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms
✨Community Page:

strangervisionhf
✨Original Model Page: deepseek-ai/DeepSeek-OCR

To know more about it, visit the app page or the respective model page!

flozi00

posted an update 20 days ago

Post

3132

Some weeks ago, i've just decide its time to leave LinkedIn for me.
It got silent around my open source activities the last year, so i thought something has to change.

That's why my focus will move to share experiences and insights about hardware, drivers, kernels and linux. I won't post about how to use models, built agents or do prompting. I want to share about some deeper layers the actual hypes are built on.

I will start posting summarizations of my articles here on the hub.

English version:
https://flozi.net/en

German translated version:
https://flozi.net/de

Feel free to reach me if you want to read something specific.

2 replies

·

ZennyKenny

posted an update 22 days ago

Post

335

Anyone got the scoop on a good OCR model that's available on inference?

Keen to make use of an endpoint (gated or not -- happy to pay for usage) for a personal project, but not so keen to pay for the GPU hosting myself.

🙈🙈🙈

4 replies

·

ljvmiranda921

authored a paper 23 days ago

FilBench: Can LLMs Understand and Generate Filipino?

Paper • 2508.03523 • Published Aug 5

prithivMLmods

posted an update 23 days ago

Post

2573

A small blog post titled - Hall of Multimodal OCR VLMs and Demonstrations has been published on ↗️ https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms on behalf of

strangervisionhf

It discusses the latest trends in OCR models, the multilingual support offered by modern OCR systems, their unique capabilities, OCR benchmark model comparisons, transformer-based implementations, and strategies for streamlining transformers compatibility.

prithivMLmods

posted an update 25 days ago

Post

3831

Implemented DeepSeek-OCR to support the latest transformers on the

strangervisionhf page. The page includes the model weights and corrected configuration, which fix the issues and allow transformers inference to run smoothly.🤗🔥

> Model: strangervisionhf/deepseek-ocr-latest-transformers
> Demo Space: prithivMLmods/DeepSeek-OCR-experimental

✅Supports the latest transformers
✅You can also opt out of the attention implementation if needed.
✅Supports torch version 2.6.0 or higher
✅torch version cuda: 12.4

If you are interested in experimenting with new things and streamlining compatibility, the

strangervisionhf organization is open for you, and you can join the community.

> Multimodal Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0, https://huggingface.co/collections/strangervisionhf/october-2025-models

> Thank you, @merve , for assigning the blazing-fast Zero GPU support!

> Notebook : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepSeek-OCR-Demo/deepseek_ocr_demo.ipynb

To know more about it, visit the app page or the respective model page!

Data Is Better Together Contributor

AI & ML interests

Recent Activity

Diffusion Language Models are Super Data Learners

FilBench: Can LLMs Understand and Generate Filipino?

AI & ML interests

Recent Activity

Team members 88

data-is-better-together-contributor's activity