Post
144
Try the demo of NVIDIA Nemotron Parse v1.1, NVIDIA's latest VLM for understanding document semantics and extracting text and table elements with spatial grounding. It is capable of comprehensive text understanding and document structure analysis in a given document, and can provide bounding boxes with coordinates.
⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-v1.1
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Some relevant Spaces
⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
Check out the other spaces in the multimodal implementation collection.
To know more about it, visit the app page or the respective model page!
⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-v1.1
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Some relevant Spaces
⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
Check out the other spaces in the multimodal implementation collection.
To know more about it, visit the app page or the respective model page!