README_EN.md

ๆจกๅž‹ๅ็งฐ๏ผšthesby/Qwen3-VL-8B-NSFW-Caption-V4.5

ๆฌข่ฟŽๅŠ ๅ…ฅ Discord ่ฟ›่กŒๅ‹ๅฅฝไบคๆตใ€‚

ๆจกๅž‹็ฎ€ไป‹ (Model Description)

thesby/Qwen3-VL-8B-NSFW-Caption-V4.5 ๆ˜ฏไธ€ไธชไธ“ไธบ้ซ˜่ดจ้‡ๅ›พๅƒๆ่ฟฐ๏ผˆImage Captioning๏ผ‰่€Œไผ˜ๅŒ–็š„ๅคšๆจกๆ€ๅคงๆจกๅž‹ใ€‚่ฏฅๆจกๅž‹ๅŸบไบŽๅผบๅคง็š„ Qwen/Qwen3-VL-8B-Instruct ่ฟ›่กŒLoraๅพฎ่ฐƒ๏ผŒๆ—จๅœจๆไพ›ๅ“่ถŠ็š„ๅ›พๅƒ็†่งฃๅ’Œๆ่ฟฐ็”Ÿๆˆ่ƒฝๅŠ›ใ€‚

ๆˆ‘ไปฌๅœจไธ€ไธชๅŒ…ๅซ็บฆ 200 ไธ‡ ้ซ˜่ดจ้‡ๅ›พๆ–‡ๅฏน็š„ๆททๅˆๆ•ฐๆฎ้›†ไธŠๅฏนๆจกๅž‹่ฟ›่กŒไบ†่ฎญ็ปƒ๏ผŒไฝฟๅ…ถๅœจๅคšไธช็ปดๅบฆไธŠ้ƒฝ่กจ็Žฐๅ‡บ่‰ฒใ€‚

ไธป่ฆๆ›ดๆ–ฐ(New Features)

  1. ่งฃๅ†ณ่‹ฑๆ–‡promptๆ‹’็ปๆ่ฟฐๅ›พ็‰‡็š„้—ฎ้ข˜๏ผš
  2. ๆ”ฏๆŒ็Ÿญ่ง†้ข‘ๆ่ฟฐ

ไธป่ฆ็‰น็‚น (Key Features)

  1. **่ถ…้ซ˜่ดจ้‡็š„ๅ›พ็‰‡ๆ่ฟฐ่ƒฝๅŠ› (Ultra-High Quality Captions)**๏ผš ๆจกๅž‹่ƒฝๅคŸ็ฒพๅ‡†ๆ•ๆ‰ๅ›พๅƒไธญ็š„ๆ ธๅฟƒไธปไฝ“ใ€็Žฏๅขƒ่ƒŒๆ™ฏใ€ไบบ็‰ฉๆƒ…็ปชใ€็‰ฉไฝ“ๆ่ดจๅ’Œๅ…‰ๅฝฑ็ญ‰ไธฐๅฏŒ็ป†่Š‚๏ผŒๆ่ฟฐ่ƒฝๅŠ›่ถ…่ฟ‡ gpt4.1-mini๏ผŒๆŽฅ่ฟ‘ไบŽ gemini-2.5-flashใ€‚
  2. **SFW & NSFW ๅ†…ๅฎนๅ…จ่ฆ†็›– (Support for both SFW & NSFW Content)**๏ผš ็ป่ฟ‡็‰นๆฎŠ็š„ๆ•ฐๆฎ้›†่ฎญ็ปƒ๏ผŒๆจกๅž‹่ƒฝๅคŸๆœ‰ๆ•ˆ่ฏ†ๅˆซๅนถๆ่ฟฐ SFW (Safe for Work) ๅ’Œ NSFW (Not Safe for Work) ็ฑปๅž‹็š„ๅ›พๅƒๅ†…ๅฎนใ€‚ๆ— ่ฎบๆ˜ฏๆ—ฅๅธธ็”Ÿๆดปๅœบๆ™ฏ่ฟ˜ๆ˜ฏๆˆไบบๅ‘ๅ†…ๅฎน๏ผŒ้ƒฝ่ƒฝ็”Ÿๆˆๆฐๅฝ“ไธ”ไฟกๆฏไธฐๅฏŒ็š„ๆ่ฟฐ๏ผŒๆžๅคงๅœฐๆ‹“ๅฎฝไบ†ๆจกๅž‹็š„้€‚็”จ่พน็•Œใ€‚
  3. **่ถ…้•ฟๆ–‡ๆœฌๆ่ฟฐ่ƒฝๅŠ› (Long-form Detailed Description)**๏ผš ไธŽไผ ็ปŸๆจกๅž‹็”Ÿๆˆ็ฎ€็Ÿญๆ ‡้ข˜ไธๅŒ๏ผŒๆœฌๆจกๅž‹ๆ“…้•ฟๅฏนๅคๆ‚็š„ๅ›พๅƒๅœบๆ™ฏ่ฟ›่กŒ่ฏฆๅฐฝๅ…ฅๅพฎ็š„ๆ็ป˜ใ€‚ๅฎƒๅฏไปฅ็”ŸๆˆๅŒ…ๅซๆ•ฐ็™พไธชๅ•่ฏ็š„่ฏฆ็ป†ๆฎต่ฝ๏ผŒๆทฑๅ…ฅๅˆ†ๆžๅ›พๅƒ็š„ๅ™ไบ‹็ป“ๆž„ๅ’Œๆฝœๅœจๅซไน‰๏ผŒๅฐคๅ…ถ้€‚ๅˆ้œ€่ฆๆทฑๅบฆๅ†…ๅฎน่งฃ่ฏป็š„ๅบ”็”จๅœบๆ™ฏใ€‚

ๅฆ‚ไฝ•ไฝฟ็”จ (How to Use)

ไฝฟ็”จ vllm ๅฏไปฅ่Žทๅพ—ๆ›ดๅฅฝ็š„ๆ€ง่ƒฝ๏ผŒ่ฏทๅ‚่€ƒ่„šๆœฌ๏ผšget_vlm_caption.py

transformers ไฝฟ็”จๆ–นๅผๅ’Œ Qwen/Qwen3-VL-8B-Instruct ๅฎŒๅ…จ็›ธๅŒ๏ผŒๆจกๅž‹ๅœจๅ›พ็‰‡ๆœ€ๅคงๅฐบๅฏธ 800*800 ๅพฎ่ฐƒ๏ผŒ้ตไปŽ่ฟ™ไธ€่ฎพ็ฝฎๅฏไปฅ่Žทๅพ—ๆ›ดๅฅฝ็š„ๆ•ˆๆžœใ€‚

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
# default: Load the model on the available device(s)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "thesby/Qwen3-VL-8B-NSFW-Caption-V4.5", dtype="auto", device_map="auto"
)
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
# model = Qwen3VLForConditionalGeneration.from_pretrained(
#     "thesby/Qwen3-VL-8B-NSFW-Caption-V4.5",
#     dtype=torch.bfloat16,
#     attn_implementation="flash_attention_2",
#     device_map="auto",
# )
processor = AutoProcessor.from_pretrained("thesby/Qwen3-VL-8B-NSFW-Caption-V4.5")
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "่ฏท็”จ่‡ช็„ถๆต็•…็š„ไธญๆ–‡ๅฏนไปฅไธ‹ๅ›พ็‰‡่ฟ›่กŒๅ…จ้ข่€Œ่ฏฆ็ป†็š„ๆ่ฟฐใ€‚ๅŒ…ๆ‹ฌๆ‰€ๆœ‰ๅฏ่งๅ…ƒ็ด ๅŠๅ…ถๅฑžๆ€ง๏ผˆๅฆ‚้ขœ่‰ฒใ€ๅคงๅฐใ€ๅฝข็Šถใ€่ดจๅœฐ๏ผ‰๏ผŒๅฎƒไปฌ็š„็ฉบ้—ดๅ…ณ็ณป๏ผŒไปฅๅŠไปปไฝ•ๆ˜พ่‘—็‰นๅพๆˆ–ไธŠไธ‹ๆ–‡ใ€‚็กฎไฟ็”จ่‡ช็„ถๆต็•…็š„ไธญๆ–‡ๆ่ฟฐๆธ…ๆ™ฐใ€็”ŸๅŠจ๏ผŒ่ƒฝๅคŸๆ•ๆ‰ๅ›พ็‰‡็š„ๆฏไธ€ไธชๆ–น้ข๏ผŒไธ้—ๆผไปปไฝ•้‡่ฆ็ป†่Š‚ใ€‚"},
        ],
    }
]
# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = inputs.to(model.device)
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=2048)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

่ง†้ข‘ๅช้œ€่ฆๅฐ†ๆŒ‰็…ง fps=1 ๆŠฝๅธงไฝœไธบๅ›พ็‰‡ๆ”พๅ…ฅ messages ๅณๅฏใ€‚

็”จ้€”ไธŽ้™ๅˆถ (Uses and Limitations)

้ข„ๆœŸ็”จ้€” (Intended Uses)

  • ่‡ชๅŠจๅŒ–ๅ†…ๅฎนๆ ‡ๆณจ๏ผšไธบๆตท้‡ๅ›พ็‰‡็”Ÿๆˆ้ซ˜่ดจ้‡็š„ๆ่ฟฐๅ’Œๆ ‡็ญพ๏ผŒ็”จไบŽๅ†…ๅฎน็ฎก็†ใ€ๆฃ€็ดขๅ’ŒๆŽจ่็ณป็ปŸใ€‚
  • ๆ— ้šœ็ข่พ…ๅŠฉๅŠŸ่ƒฝ๏ผšไธบ่ง†้šœ็”จๆˆทๆ่ฟฐๅ›พๅƒๅ†…ๅฎน๏ผŒๅธฎๅŠฉไป–ไปฌ็†่งฃ่ง†่ง‰ไฟกๆฏใ€‚
  • ๅˆ›ๆ„ๅ†…ๅฎน็”Ÿๆˆ๏ผšไฝœไธบ็ตๆ„Ÿๆฅๆบ๏ผŒไธบ่‰บๆœฏๅˆ›ไฝœใ€ๆ•…ไบ‹ๅ†™ไฝœใ€ๅนฟๅ‘Šๆ–‡ๆกˆ็ญ‰ๆไพ›ๅŸบไบŽๅ›พๅƒ็š„ๆ–‡ๆœฌๆ่ฟฐใ€‚
  • ๆ•ฐๅญ—ๅ†…ๅฎนๅˆ†ๆž๏ผšๅฏนๅŒ…ๆ‹ฌSFWๅ’ŒNSFWๅœจๅ†…็š„ๅคš็งๅ›พๅƒๅ†…ๅฎน่ฟ›่กŒ่‡ชๅŠจๅŒ–ๅˆ†ๆžๅ’Œๅฝ’ๆกฃใ€‚

ไธ้€‚็”จ่Œƒๅ›ด (Out-of-Scope)

  • ๆœฌๆจกๅž‹ไธๅบ”็”จไบŽ็”Ÿๆˆไปปไฝ•ๆœ‰ๅฎณใ€้žๆณ•ใ€ๆญง่ง†ๆ€งๆˆ–ไพต็Šฏไป–ไบบ้š็ง็š„ๅ†…ๅฎนใ€‚
  • ๆจกๅž‹็š„่พ“ๅ‡บไธๅบ”่ขซ่ง†ไธบ็ปๅฏนไบ‹ๅฎž๏ผŒๆ›ดไธ่ƒฝ็”จไบŽไปปไฝ•้œ€่ฆ้ซ˜็ฒพๅบฆๅ’Œ้ซ˜ๅฏ้ ๆ€ง็š„ๅ…ณ้”ฎๅ†ณ็ญ–๏ผˆๅฆ‚ๅŒป็–—่ฏŠๆ–ญใ€ๆณ•ๅพ‹ๅˆคๅ†ณ็ญ‰๏ผ‰๏ผŒๆ‰€ๆœ‰้‡่ฆๅบ”็”จ้ƒฝ้œ€ๆœ‰ไบบๅทฅๅฎกๆ ธใ€‚
  • ็”จๆˆทๅœจไฝฟ็”จๆจกๅž‹ๅค„็†NSFWๅ†…ๅฎนๆ—ถ๏ผŒๅบ”ไธฅๆ ผ้ตๅฎˆๅฝ“ๅœฐๆณ•ๅพ‹ๆณ•่ง„๏ผŒๅนถๆ‰ฟๆ‹…็›ธๅบ”่ดฃไปปใ€‚

ๅฑ€้™ๆ€งไธŽๅ่ง (Limitations and Bias)

  • **ๅนป่ง‰ (Hallucination)**๏ผšไธŽๆ‰€ๆœ‰ๅคงๆจกๅž‹ไธ€ๆ ท๏ผŒๆœฌๆจกๅž‹ๅฏ่ƒฝไบง็”Ÿโ€œๅนป่ง‰โ€๏ผŒๅณ็”Ÿๆˆๅ›พๅƒไธญไธๅญ˜ๅœจ็š„็ป†่Š‚ใ€‚
  • **ๆ•ฐๆฎๅ่ง (Data Bias)**๏ผšๆจกๅž‹็š„่พ“ๅ‡บๅฏ่ƒฝๅๆ˜ ๅ…ถ่ฎญ็ปƒๆ•ฐๆฎไธญๅญ˜ๅœจ็š„็คพไผšๅ’Œๆ–‡ๅŒ–ๅ่ง๏ผˆไพ‹ๅฆ‚๏ผŒๅœจๆ€งๅˆซใ€็งๆ—ๆˆ–ๅนด้พ„ๆ–น้ข็š„ๅˆปๆฟๅฐ่ฑก๏ผ‰ใ€‚
  • NSFWๅˆคๆ–ญ่พน็•Œ๏ผšๅฏนไบŽๅค„ไบŽSFWๅ’ŒNSFW่พน็•Œ็š„ๆจก็ณŠๅ›พๅƒ๏ผŒๆจกๅž‹็š„ๅˆคๆ–ญๅ’Œๆ่ฟฐๅฏ่ƒฝไธŽไบบ็ฑป้ข„ๆœŸไธ็ฌฆใ€‚

่ฎญ็ปƒ็ป†่Š‚ (Training Details)

่ฎญ็ปƒๆ•ฐๆฎ (Training Data)

ๆœฌๆจกๅž‹ๅœจไธ€ไธช็ฒพๅฟƒๆž„ๅปบ็š„ใ€็บฆ 200 ไธ‡ ๅ›พๆ–‡ๅฏน็š„ๆ•ฐๆฎ้›†ไธŠ่ฟ›่กŒไบ†ๅพฎ่ฐƒใ€‚่ฏฅๆ•ฐๆฎ้›†็ป่ฟ‡ไธฅๆ ผ็š„็ญ›้€‰ๅ’Œๆธ…ๆด—๏ผŒๆททๅˆไบ†๏ผš

  • ้ซ˜่ดจ้‡็š„ๅ…ฌๅผ€ๅ›พๆ–‡ๆ•ฐๆฎ้›†ใ€‚
  • ไธ“้—จๆ”ถ้›†ๅ’Œๆ ‡ๆณจ็š„ใ€ๆถต็›–ๅนฟๆณ›SFWๅ’ŒNSFWๅœบๆ™ฏ็š„็งๆœ‰ๆ•ฐๆฎ้›†ใ€‚ ๆ•ฐๆฎๅˆ†ๅธƒๆ—จๅœจๆๅ‡ๆจกๅž‹ๅœจ็ป†่Š‚ๆ•ๆ‰ๅ’Œ้•ฟๆ–‡ๆœฌ็”Ÿๆˆๆ–น้ข็š„่ƒฝๅŠ›๏ผŒๅŒๆ—ถ็กฎไฟๅฏนไธๅŒ็ฑปๅž‹ๅ†…ๅฎน็š„ๆณ›ๅŒ–ๆ€งใ€‚

่ฎญ็ปƒ่ฟ‡็จ‹ (Training Procedure)

  • **ๅŸบ็ก€ๆจกๅž‹ (Base Model)**๏ผšQwen/Qwen3-VL-8B-Instruct
  • **ๅพฎ่ฐƒ็ญ–็•ฅ (Finetuning Strategy)**๏ผšLoraๅพฎ่ฐƒ (Lora Fine-tuning)
  • **ๆก†ๆžถ (Framework)**๏ผšunsloth

่ฎกๅˆ’ (Plans)

  • ๆ้ซ˜ไบบ็‰ฉๅŠจไฝœๆ่ฟฐๅ‡†็กฎๆ€ง
  • ๆ้ซ˜่ง†้ข‘ๆ่ฟฐๅ‡†็กฎๆ€ง
  • ๆขๅคQwen3-VL้€š็”จ่ง†่ง‰่ƒฝๅŠ›
Downloads last month
14,507
Safetensors
Model size
9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for thesby/Qwen3-VL-8B-NSFW-Caption-V4.5

Finetuned
(78)
this model
Quantizations
3 models