RuBERT NER for Russian Search Queries
This model is a fine-tuned version of DeepPavlov/rubert-base-cased
for Named Entity Recognition (NER) on short and noisy Russian search queries typical of grocery and e-commerce platforms.
The model identifies key product-related entities using the BIO tagging scheme with the following labels:
TYPE— product type or category (e.g., молоко, чипсы, вода)BRAND— brand name (e.g., Lay’s, Простоквашино)VOLUME— quantity or size (e.g., 0.5 л, 200 г, 10 шт)PERCENT— percentage or fat content (e.g., 2.5%, 15%)
🧠 Model Description
- Architecture: BERT (RuBERT cased)
 - Task: Token classification / Named Entity Recognition
 - Language: Russian
 - Tagging scheme: BIO
 - Entity types: TYPE, BRAND, VOLUME, PERCENT
 - Training data: Private hackathon dataset from X5 Group Company of anonymized Russian search queries (not publicly available)
 
The model is optimized for short, informal queries and handles casing variation, abbreviations, and mild typos.
🚀 Usage
This repository contains only the fine-tuned model weights and tokenizer configuration.
A minimal example of how to load the model:
from transformers import AutoTokenizer, AutoModelForTokenClassification
model_id = "Martsv07/rubert-ner-search-queries"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
⚖️ License
Distributed under the Apache-2.0 license,
in accordance with the license of the base model DeepPavlov/rubert-base-cased.
👥 Authors
Developed by @Martsv07.
🔗 Related Repository
For the full inference pipeline — including preprocessing, postprocessing,
and FastAPI service implementation — see the accompanying
GitHub repository.
- Downloads last month
 - 32
 
Model tree for Martsv07/rubert-ner-search-queries
Base model
DeepPavlov/rubert-base-cased