ๆจกๅๅ็งฐ๏ผthesby/Qwen3-VL-8B-NSFW-Caption-V4.5
ๆฌข่ฟๅ ๅ ฅ Discord ่ฟ่กๅๅฅฝไบคๆตใ
ๆจกๅ็ฎไป (Model Description)
thesby/Qwen3-VL-8B-NSFW-Caption-V4.5 ๆฏไธไธชไธไธบ้ซ่ดจ้ๅพๅๆ่ฟฐ๏ผImage Captioning๏ผ่ไผๅ็ๅคๆจกๆๅคงๆจกๅใ่ฏฅๆจกๅๅบไบๅผบๅคง็ Qwen/Qwen3-VL-8B-Instruct ่ฟ่กLoraๅพฎ่ฐ๏ผๆจๅจๆไพๅ่ถ็ๅพๅ็่งฃๅๆ่ฟฐ็ๆ่ฝๅใ
ๆไปฌๅจไธไธชๅ ๅซ็บฆ 200 ไธ ้ซ่ดจ้ๅพๆๅฏน็ๆททๅๆฐๆฎ้ไธๅฏนๆจกๅ่ฟ่กไบ่ฎญ็ป๏ผไฝฟๅ ถๅจๅคไธช็ปดๅบฆไธ้ฝ่กจ็ฐๅบ่ฒใ
ไธป่ฆๆดๆฐ(New Features)
- ่งฃๅณ่ฑๆpromptๆ็ปๆ่ฟฐๅพ็็้ฎ้ข๏ผ
- ๆฏๆ็ญ่ง้ขๆ่ฟฐ
ไธป่ฆ็น็น (Key Features)
- **่ถ ้ซ่ดจ้็ๅพ็ๆ่ฟฐ่ฝๅ (Ultra-High Quality Captions)**๏ผ ๆจกๅ่ฝๅค็ฒพๅๆๆๅพๅไธญ็ๆ ธๅฟไธปไฝใ็ฏๅข่ๆฏใไบบ็ฉๆ ็ปชใ็ฉไฝๆ่ดจๅๅ ๅฝฑ็ญไธฐๅฏ็ป่๏ผๆ่ฟฐ่ฝๅ่ถ ่ฟ gpt4.1-mini๏ผๆฅ่ฟไบ gemini-2.5-flashใ
- **SFW & NSFW ๅ ๅฎนๅ จ่ฆ็ (Support for both SFW & NSFW Content)**๏ผ ็ป่ฟ็นๆฎ็ๆฐๆฎ้่ฎญ็ป๏ผๆจกๅ่ฝๅคๆๆ่ฏๅซๅนถๆ่ฟฐ SFW (Safe for Work) ๅ NSFW (Not Safe for Work) ็ฑปๅ็ๅพๅๅ ๅฎนใๆ ่ฎบๆฏๆฅๅธธ็ๆดปๅบๆฏ่ฟๆฏๆไบบๅๅ ๅฎน๏ผ้ฝ่ฝ็ๆๆฐๅฝไธไฟกๆฏไธฐๅฏ็ๆ่ฟฐ๏ผๆๅคงๅฐๆๅฎฝไบๆจกๅ็้็จ่พน็ใ
- **่ถ ้ฟๆๆฌๆ่ฟฐ่ฝๅ (Long-form Detailed Description)**๏ผ ไธไผ ็ปๆจกๅ็ๆ็ฎ็ญๆ ้ขไธๅ๏ผๆฌๆจกๅๆ ้ฟๅฏนๅคๆ็ๅพๅๅบๆฏ่ฟ่ก่ฏฆๅฐฝๅ ฅๅพฎ็ๆ็ปใๅฎๅฏไปฅ็ๆๅ ๅซๆฐ็พไธชๅ่ฏ็่ฏฆ็ปๆฎต่ฝ๏ผๆทฑๅ ฅๅๆๅพๅ็ๅไบ็ปๆๅๆฝๅจๅซไน๏ผๅฐคๅ ถ้ๅ้่ฆๆทฑๅบฆๅ ๅฎน่งฃ่ฏป็ๅบ็จๅบๆฏใ
ๅฆไฝไฝฟ็จ (How to Use)
ไฝฟ็จ vllm ๅฏไปฅ่ทๅพๆดๅฅฝ็ๆง่ฝ๏ผ่ฏทๅ่่ๆฌ๏ผget_vlm_caption.py
transformers ไฝฟ็จๆนๅผๅ Qwen/Qwen3-VL-8B-Instruct ๅฎๅ จ็ธๅ๏ผๆจกๅๅจๅพ็ๆๅคงๅฐบๅฏธ 800*800 ๅพฎ่ฐ๏ผ้ตไป่ฟไธ่ฎพ็ฝฎๅฏไปฅ่ทๅพๆดๅฅฝ็ๆๆใ
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
# default: Load the model on the available device(s)
model = Qwen3VLForConditionalGeneration.from_pretrained(
"thesby/Qwen3-VL-8B-NSFW-Caption-V4.5", dtype="auto", device_map="auto"
)
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
# model = Qwen3VLForConditionalGeneration.from_pretrained(
# "thesby/Qwen3-VL-8B-NSFW-Caption-V4.5",
# dtype=torch.bfloat16,
# attn_implementation="flash_attention_2",
# device_map="auto",
# )
processor = AutoProcessor.from_pretrained("thesby/Qwen3-VL-8B-NSFW-Caption-V4.5")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "่ฏท็จ่ช็ถๆต็
็ไธญๆๅฏนไปฅไธๅพ็่ฟ่กๅ
จ้ข่่ฏฆ็ป็ๆ่ฟฐใๅ
ๆฌๆๆๅฏ่งๅ
็ด ๅๅ
ถๅฑๆง๏ผๅฆ้ข่ฒใๅคงๅฐใๅฝข็ถใ่ดจๅฐ๏ผ๏ผๅฎไปฌ็็ฉบ้ดๅ
ณ็ณป๏ผไปฅๅไปปไฝๆพ่็นๅพๆไธไธๆใ็กฎไฟ็จ่ช็ถๆต็
็ไธญๆๆ่ฟฐๆธ
ๆฐใ็ๅจ๏ผ่ฝๅคๆๆๅพ็็ๆฏไธไธชๆน้ข๏ผไธ้ๆผไปปไฝ้่ฆ็ป่ใ"},
],
}
]
# Preparation for inference
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
inputs = inputs.to(model.device)
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=2048)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
่ง้ขๅช้่ฆๅฐๆ็ ง fps=1 ๆฝๅธงไฝไธบๅพ็ๆพๅ ฅ messages ๅณๅฏใ
็จ้ไธ้ๅถ (Uses and Limitations)
้ขๆ็จ้ (Intended Uses)
- ่ชๅจๅๅ ๅฎนๆ ๆณจ๏ผไธบๆตท้ๅพ็็ๆ้ซ่ดจ้็ๆ่ฟฐๅๆ ็ญพ๏ผ็จไบๅ ๅฎน็ฎก็ใๆฃ็ดขๅๆจ่็ณป็ปใ
- ๆ ้็ข่พ ๅฉๅ่ฝ๏ผไธบ่ง้็จๆทๆ่ฟฐๅพๅๅ ๅฎน๏ผๅธฎๅฉไปไปฌ็่งฃ่ง่งไฟกๆฏใ
- ๅๆๅ ๅฎน็ๆ๏ผไฝไธบ็ตๆๆฅๆบ๏ผไธบ่บๆฏๅไฝใๆ ไบๅไฝใๅนฟๅๆๆก็ญๆไพๅบไบๅพๅ็ๆๆฌๆ่ฟฐใ
- ๆฐๅญๅ ๅฎนๅๆ๏ผๅฏนๅ ๆฌSFWๅNSFWๅจๅ ็ๅค็งๅพๅๅ ๅฎน่ฟ่ก่ชๅจๅๅๆๅๅฝๆกฃใ
ไธ้็จ่ๅด (Out-of-Scope)
- ๆฌๆจกๅไธๅบ็จไบ็ๆไปปไฝๆๅฎณใ้ๆณใๆญง่งๆงๆไพต็ฏไปไบบ้็ง็ๅ ๅฎนใ
- ๆจกๅ็่พๅบไธๅบ่ขซ่งไธบ็ปๅฏนไบๅฎ๏ผๆดไธ่ฝ็จไบไปปไฝ้่ฆ้ซ็ฒพๅบฆๅ้ซๅฏ้ ๆง็ๅ ณ้ฎๅณ็ญ๏ผๅฆๅป็่ฏๆญใๆณๅพๅคๅณ็ญ๏ผ๏ผๆๆ้่ฆๅบ็จ้ฝ้ๆไบบๅทฅๅฎกๆ ธใ
- ็จๆทๅจไฝฟ็จๆจกๅๅค็NSFWๅ ๅฎนๆถ๏ผๅบไธฅๆ ผ้ตๅฎๅฝๅฐๆณๅพๆณ่ง๏ผๅนถๆฟๆ ็ธๅบ่ดฃไปปใ
ๅฑ้ๆงไธๅ่ง (Limitations and Bias)
- **ๅนป่ง (Hallucination)**๏ผไธๆๆๅคงๆจกๅไธๆ ท๏ผๆฌๆจกๅๅฏ่ฝไบง็โๅนป่งโ๏ผๅณ็ๆๅพๅไธญไธๅญๅจ็็ป่ใ
- **ๆฐๆฎๅ่ง (Data Bias)**๏ผๆจกๅ็่พๅบๅฏ่ฝๅๆ ๅ ถ่ฎญ็ปๆฐๆฎไธญๅญๅจ็็คพไผๅๆๅๅ่ง๏ผไพๅฆ๏ผๅจๆงๅซใ็งๆๆๅนด้พๆน้ข็ๅปๆฟๅฐ่ฑก๏ผใ
- NSFWๅคๆญ่พน็๏ผๅฏนไบๅคไบSFWๅNSFW่พน็็ๆจก็ณๅพๅ๏ผๆจกๅ็ๅคๆญๅๆ่ฟฐๅฏ่ฝไธไบบ็ฑป้ขๆไธ็ฌฆใ
่ฎญ็ป็ป่ (Training Details)
่ฎญ็ปๆฐๆฎ (Training Data)
ๆฌๆจกๅๅจไธไธช็ฒพๅฟๆๅปบ็ใ็บฆ 200 ไธ ๅพๆๅฏน็ๆฐๆฎ้ไธ่ฟ่กไบๅพฎ่ฐใ่ฏฅๆฐๆฎ้็ป่ฟไธฅๆ ผ็็ญ้ๅๆธ ๆด๏ผๆททๅไบ๏ผ
- ้ซ่ดจ้็ๅ ฌๅผๅพๆๆฐๆฎ้ใ
- ไธ้จๆถ้ๅๆ ๆณจ็ใๆถต็ๅนฟๆณSFWๅNSFWๅบๆฏ็็งๆๆฐๆฎ้ใ ๆฐๆฎๅๅธๆจๅจๆๅๆจกๅๅจ็ป่ๆๆๅ้ฟๆๆฌ็ๆๆน้ข็่ฝๅ๏ผๅๆถ็กฎไฟๅฏนไธๅ็ฑปๅๅ ๅฎน็ๆณๅๆงใ
่ฎญ็ป่ฟ็จ (Training Procedure)
- **ๅบ็กๆจกๅ (Base Model)**๏ผ
Qwen/Qwen3-VL-8B-Instruct - **ๅพฎ่ฐ็ญ็ฅ (Finetuning Strategy)**๏ผLoraๅพฎ่ฐ (Lora Fine-tuning)
- **ๆกๆถ (Framework)**๏ผunsloth
่ฎกๅ (Plans)
- ๆ้ซไบบ็ฉๅจไฝๆ่ฟฐๅ็กฎๆง
- ๆ้ซ่ง้ขๆ่ฟฐๅ็กฎๆง
- ๆขๅคQwen3-VL้็จ่ง่ง่ฝๅ
- Downloads last month
- 14,507