torchgeo (TorchGeo)

posted an update about 21 hours ago

Post

932

Implemented DeepSeek-OCR to support the latest transformers on the

strangervisionhf page. The page includes the model weights and corrected configuration, which fix the issues and allow transformers inference to run smoothly.🤗🔥

> Model: strangervisionhf/deepseek-ocr-latest-transformers
> Demo Space: prithivMLmods/DeepSeek-OCR-experimental

✅Supports the latest transformers
✅You can also opt out of the attention implementation if needed.
✅Supports torch version 2.6.0 or higher
✅torch version cuda: 12.4

If you are interested in experimenting with new things and streamlining compatibility, the

strangervisionhf organization is open for you, and you can join the community.

> Multimodal Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0, https://huggingface.co/collections/strangervisionhf/october-2025-models

> Thank you, @merve , for assigning the blazing-fast Zero GPU support!

> Notebook : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepSeek-OCR-Demo/deepseek_ocr_demo.ipynb

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 2 days ago

Post

1010

Introducing Gliese-OCR-7B-Post2.0-final, a document content-structure retrieval VLM designed for content extraction (OCR), summarization, and document visual question answering. This is the fourth and final model in the Camel Doc OCR VLM series, following Gliese-OCR-7B-Post1.0. The model delivers superior accuracy across a wide range of document types, including scanned PDFs, handwritten pages, structured forms, and analytical reports.🚀🤗

> Gliese-OCR-7B-Post2.0-final : prithivMLmods/Gliese-OCR-7B-Post2.0-final
> Gliese-OCR-7B-Post1.0 (previous) : prithivMLmods/Gliese-OCR-7B-Post1.0
> Gliese OCR Post-x.0 (collection) : https://huggingface.co/collections/prithivMLmods/gliese-ocr-post-x0
> Multimodal Implementations (collection) : https://huggingface.co/collections/prithivMLmods/multimodal-implementations
> Qwen VL Captions (other-collection) : https://huggingface.co/collections/prithivMLmods/qwen-vl-captions
> Run Demo Here : prithivMLmods/Gliese-OCR-7B-Post2.0-final
> GitHub (4bit) : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Gliese-OCR-7B-Post2.0-final(4bit)/Gliese_OCR_7B_Post2_0_final.ipynb

.
.
.
> To know more about it, visit the app page or the respective model page!!

prithivMLmods

posted an update 3 days ago

Post

1758

Here is the official Florence-2 Transformers-converted demo for the following vision models: florence-community/Florence-2-large, florence-community/Florence-2-large-ft, florence-community/Florence-2-base, and florence-community/Florence-2-base-ft. These models support tasks such as object detection, captioning, detailed captioning, more detailed captioning, dense region captioning, region proposal, OCR, and OCR with region. Try the official demo at the link below:

> Space: prithivMLmods/florence2-vision-models
> Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

> To know more about it, visit the app page or the respective model page!!

ronantakizawa

posted an update 3 days ago

Post

1412

Introducing the Finance-Instruct-500k-Japanese dataset 🎉

This is a Japanese-translated version of the @Josephgflowers Finance-Instruct-500k dataset, which includes complex questions and answers related to finance and Economics.

#datasets #finance #finance-instruct #japanese

ronantakizawa/Finance-Instruct-500k-Japanese

SelmaNajih001

posted an update 6 days ago

Post

2717

How Financial News Can Be Used to Train Good Financial Models 📰
Numbers tell you what happened, but news tells you why.
I’ve written an article explaining how news can be used to train AI models for sentiment analysis and better forecasting. Hope you find it interesting!

Read it here: https://huggingface.co/blog/SelmaNajih001/llms-applied-to-finance

I would love to read your opinions! I’m open to suggestions on how to improve the methodology and the training

1 reply

·

SelmaNajih001

posted an update 7 days ago

Post

2980

Which is the best model to use as a signal for investment?
Here who is gaining the most:
SelmaNajih001/InvestmentStrategyBasedOnSentiment

The Space uses titles from this dataset:
📊 SelmaNajih001/Cnbc_MultiCompany

Given a news title, it calculates a sentiment score : if the score crosses a certain threshold, the strategy decides to buy or sell.
Each trade lasts one day, and the strategy then computes the daily return.
For Tesla the best model seems to be the regression 👀
Just a quick note: the model uses the closing price as the buy price, meaning it already reflects the impact of the news.

ZennyKenny

posted an update 7 days ago

Post

283

Has anyone tried Strawberry Browser? https://strawberrybrowser.com/?ref_id=8D41NQCY7

😇 Shamelessly sharing my referral link here to move up in the waitlist line. Help me out, give it a click.

2 replies

·

ronantakizawa

posted an update 8 days ago

Post

1541

Excited to announce 4 AWQ quantized models from #AllenAI! 🎉

Molmo-7B-D AWQ (14GB→5GB): Efficient VLM performing between GPT-4V and GPT-4o on academic benchmarks, with just 6.1% perplexity degradation.

MolmoAct-7B-D AWQ (14GB→6GB): Specialized robotic manipulation model reduced by ~57%.

Molmo-72B AWQ (145GB→38GB): VLM with Qwen2-72B decoder that performs competitively with GPT-4, achieving only 10.5% perplexity degradation while saving 107GB of memory.

OLMo-2-32B-Instruct AWQ (64GB→17GB): LLM post-trained on Tülu 3 with 3% perplexity degradation while saving ~50GB.

All VLMs only had their text models quantized.

ronantakizawa/molmo-7b-d-awq
ronantakizawa/molmoact-7b-d-awq
ronantakizawa/molmo-72b-awq
ronantakizawa/olmo2-32b-instruct-awq

prithivMLmods

posted an update 9 days ago

Post

2232

Let’s have the comparison again with Multimodal OCR3:

nanonets/Nanonets-OCR2-3B vs allenai/olmOCR-2-7B-1025 vs rednote-hilab/dots.ocr vs datalab-to/chandra

Try it here @ prithivMLmods/Multimodal-OCR3

Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

ronantakizawa

posted an update 12 days ago

Post

3795

Introducing AWQ and GPTQ quantized versions of SmolVLM from Hugging Face!

These models only had their text models quantized, and had a 50% model size reduction (4GB~2GB) while keeping model degradation under 1% on the DocVQA benchmark.

#huggingface #smolvlm #smollm

ronantakizawa/SmolVLM-Instruct-awq

ronantakizawa/SmolVLM-Instruct-gptq

SelmaNajih001

posted an update 13 days ago

Post

658

How Financial News Can Be Used to Train Good Financial Models 📰
Numbers tell you what happened, but news tells you why.
I’ve written an article explaining how news can be used to train AI models for sentiment analysis and better forecasting. Hope you find it interesting!

Read it here: https://huggingface.co/blog/SelmaNajih001/llms-applied-to-finance

I would love to read your opinions! I’m open to suggestions on how to improve the methodology and the training

1 reply

·

prithivMLmods

posted an update 14 days ago

Post

1872

Now you can try all the latest state-of-the-art multimodal vision-language models from the Qwen3-VL series demo on Hugging Face Spaces — including 4B, 8B, and 30B (Instruct, 4B-Thinking) variants. I’ve also uploaded the weights for the Abliterated variants of these models, up to 30B parameters. Check out the Spaces and model links below! 🤗🔥

✨ Qwen3-VL[4B,8B]: prithivMLmods/Qwen3-VL-Outpost
✨ Qwen3-VL-30B-A3B-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✨ Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Qwen3-VL Abliterated Model Collection [ Version 1.0 ]

✨ Qwen3-VL-8B-Instruct-abliterated: prithivMLmods/Qwen3-VL-8B-Instruct-abliterated
✨ Qwen3-VL-4B-Instruct-abliterated: prithivMLmods/Qwen3-VL-4B-Instruct-abliterated
✨ Qwen3-VL-8B-Thinking-abliterated: prithivMLmods/Qwen3-VL-8B-Thinking-abliterated
✨ Qwen3-VL-4B-Thinking-abliterated: prithivMLmods/Qwen3-VL-4B-Thinking-abliterated
✨ Qwen3-VL-30B-A3B-Instruct-abliterated: prithivMLmods/Qwen3-VL-30B-A3B-Instruct-abliterated
✨ Qwen3-VL-30B-A3B-Thinking-abliterated: prithivMLmods/Qwen3-VL-30B-A3B-Thinking-abliterated

⚡Collection: prithivMLmods/qwen3-vl-abliteration-oct-1625-68f0e3e567ef076594605fac

Note: This is version 1.0 of the Abliteration of the Qwen3-VL series of models. It may perform sub-optimally in some cases. If you encounter any issues, please open a discussion.

ZennyKenny

posted an update 15 days ago

Post

2150

Did Hugging Face just ban hammer a bunch of bot accounts or am I just so uninteresting that 30% of my subs dropped me overnight?

😬 Wait, don't answer that.

2 replies

·

ronantakizawa

posted an update 16 days ago

Post

3424

Released an AWQ quantized version of BosonAI’s Higgs-Llama-3-70B model! 🎉
The Higgs-Llama-3-70B is an LLM specialized in role-playing, useful for game characters.

Using an NVIDIA B200 GPU, I was able to compress the huge 140GB model into 37GB while keeping minimal perplexity 👍

ronantakizawa/higgs-llama-3-70b-awq

prithivMLmods

posted an update 16 days ago

Post

3046

Introducing Image-Guard-2.0, an experimental, lightweight vision-language encoder model with a size of 0.1B (<100M parameters), trained on SigLIP2 (siglip2-base-patch16-224). Designed for multi-label image classification tasks, this model functions as an image safety system, serving as an image guard or moderator across a wide range of categories, from anime to realistic imagery.

⚡blog-article: https://huggingface.co/blog/prithivMLmods/image-guard-models

It also performs strict moderation and filtering of artificially synthesized content, demonstrating strong detection and handling of explicit images. Image-Guard-2.0 delivers robust performance in streamlined scenarios, ensuring reliable and effective classification across diverse visual inputs.

ZennyKenny

authored a paper 16 days ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published 21 days ago • 32

ZennyKenny

posted an update 16 days ago

Post

208

🔥 BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution from

bigcode is now available on Hugging Face!

👉 Check out the paper and please drop an upvote if you like the work BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution (2510.08697)

SelmaNajih001

posted an update 18 days ago

Post

379

Which is the best model to use as a signal for investment? 🤔
I’ve created a Space where you can compare three models:
-Two available on my profile
- ProsusAI/finbert
You can try it here:
👉 SelmaNajih001/InvestmentStrategyBasedOnSentiment
The Space uses titles from this dataset:
📊 SelmaNajih001/Cnbc_MultiCompany

Given a news title, it calculates a sentiment score : if the score crosses a certain threshold, the strategy decides to buy or sell.
Each trade lasts one day, and the strategy then computes the daily return.

Just a quick note: the model uses the closing price as the buy price, meaning it already reflects the impact of the news.
If I had chosen the opening price, the results would have been less biased but less realistic given the data available.

prithivMLmods

posted an update 19 days ago

Post

3317

The demo of Qwen3-VL-30B-A3B-Instruct, the next-generation and powerful vision-language model in the Qwen series, delivers comprehensive upgrades across the board — including superior text understanding and generation, deeper visual perception and reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. 🤗🔥

⚡ Space / App: prithivMLmods/Qwen3-VL-HF-Demo

The model’s demo supports a wide range of tasks, including;
Image Inference, Video Inference, PDF Inference, Image Captioning (VLA), GIF Inference.

⚡ Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Thanks for granting the blazing-fast Zero GPU access, @merve 🙏

⚡ Other Pages

> Github: https://github.com/prithivsakthiur/qwen3-vl-hf-demo
> Multimodal VLMs July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
> VL caption — < Sep 15 ’25 : prithivMLmods/vl-caption-sep-15-25-68c7f6d737985c63c13e2391
> Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-aug25-68a56aac39fe8084f3c168bd

To know more about it, visit the app page or the respective model page!!

ronantakizawa

posted an update 20 days ago

Post

966

Introducing the japanese-text-difficulty dataset! 🎉

This dataset gathered texts from Aozora Bunko and marked them with jReadability scores, plus detailed metrics on kanji density, vocabulary, grammar, and sentence structure.

This is an excellent dataset if you want to train your LLM to understand the complexities of the Japanese language.

ronantakizawa/japanese-text-difficulty

#dataset #japanese #textdifficulty

TorchGeo

AI & ML interests

Recent Activity

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

AI & ML interests

Recent Activity

Team members 40

torchgeo's activity