Introducing Photo-Mate-v2, based on FLUX.1-Kontext-dev, for advanced image manipulation tasks. It supports transforming scenes into top-down/bottom-up perspectives, CAM-right/left-view and its reverse, as well as general kontext-specified object removal. Below is the list of demos and adapters.π₯π€
It discusses the latest trends in OCR models, the multilingual support offered by modern OCR systems, their unique capabilities, OCR benchmark model comparisons, transformer-based implementations, and strategies for streamlining transformers compatibility.
Multilingual Tokenization Showdown Analyzing 12 LLM Tokenizers Across 204 Languages.
First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages: Norod78/WikiCat-Multilingual
For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language).
I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo: https://github.com/Norod/wikicat-tokenizer-eval
I am dedicating this weekend to practicing/reading the latest b(ook)log from hugging face. It is meant to be a guide for anyone trying to go from βwe have a great dataset and GPUsβ to βwe built a really strong model.β Will share thoughts upon completion.
ππ‘π New Research Alert - ICCV 2025 (Oral)! ππͺπ π Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models π
π Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.
π₯ Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang