DeepSeek-R1-Distill-Qwen-32B SFT Mixed Model

🎯 モデル概要

このモデルは、松尾研LLMコンペ2025で開発された推論モデル（reasoningモデル）です。DeepSeek-R1-Distill-Qwen-32Bを基盤として、事後学習（SFT）により推論能力を向上させています。

🏗️ 基盤モデル情報

基盤モデル: DeepSeek-R1-Distill-Qwen-32B
アーキテクチャ: Dense Transformer
パラメータ数: 32B
ライセンス: MIT
ベースURL: DeepSeek-R1-Distill-Qwen-32B

🚀 学習詳細

学習手法

学習タイプ: Supervised Fine-Tuning (SFT)
学習データ: 混合データセット
- Math Hard: 50%
- Math Mid: 30%
- Science: 20%

学習設定

学習環境: H100 GPU (80GB) × 8基
Tensor Parallel Size: 8
最大モデル長: 131,072 tokens
GPU メモリ使用率: 85%
データ型: bfloat16
最大完了トークン: 2000

🔧 推論設定

必須設定

reasoning: true (推論過程の出力を有効化)
max_completion_tokens: 2000
trust_remote_code: true

推論環境

vLLM: 最新版
CUDA: 12.6
cuDNN: 9.6.0
NCCL: 2.24.3

📊 評価結果

評価ベンチマーク

Humanity's Last Exam (HLE): 主要評価指標
Do-Not-Answer: 安全性評価

評価設定

評価モデル: o3-mini-2025-01-31
ワーカー数: 32
最大サンプル数: 50

🚨 重要な注意事項

推論時の必須要件

reasoning=true の設定が必須
thinkタグまたはreasoningタグの出力が必要
運営提供の評価コードとの互換性を保つ

禁止事項

評価コードの改変は禁止
独自の評価プロンプト使用は禁止
外部APIやツールの使用は禁止

📁 ファイル構成

model/
├── config.json          # モデル設定
├── pytorch_model.bin    # モデル重み
├── tokenizer.json      # トークナイザー
└── README.md           # このファイル

🔍 使用方法

基本的な推論

from transformers import AutoModelForCausalLM, AutoTokenizer

# モデルとトークナイザーの読み込み
model = AutoModelForCausalLM.from_pretrained(
    "weblab-llm-competition-2025-bridge/truthowl-model1",
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "weblab-llm-competition-2025-bridge/truthowl-model1",
    trust_remote_code=True
)

# 推論実行（reasoning有効）
prompt = "Please solve the problem. Include your reasoning process in your answer."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=2000,
    do_sample=True,
    temperature=0.7
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

vLLM使用時

python -m vllm.entrypoints.openai.api_server \
    --model weblab-llm-competition-2025-bridge/truthowl-model1 \
    --tensor-parallel-size 8 \
    --max-model-len 131072 \
    --gpu-memory-utilization 0.85 \
    --trust-remote-code \
    --dtype bfloat16

📋 コンペ情報

コンペ名: 松尾研LLMコンペ2025
チーム名: TruthOwl🦉
提出日: 2025年8月25日
フェーズ: Phase1（予選）

🔗 関連リンク

評価コード: matsuolab/llm_bridge_prod/eval_hle
Do-Not-Answer評価: matsuolab/llm_bridge_prod/eval_dna
実行スクリプト: eval_32b_sft_mixed_timestamped.sh

📞 サポート

技術的な質問や問題がございましたら、以下の方法でお問い合わせください：

GitHub Issues: リポジトリのIssuesセクション
チーム連絡先: コンペ運営を通じて

⚠️ 注意: このモデルは松尾研LLMコンペ2025用に開発されたものです。推論時は必ずreasoning=trueの設定でご利用ください。

Downloads last month: 4

Safetensors

Model size

33B params

Tensor type

F16

Model tree for weblab-llm-competition-2025-bridge/team-truthowl-model

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Finetuned

(83)

this model

Dataset used to train weblab-llm-competition-2025-bridge/team-truthowl-model

Evaluation results

HLE Score on Humanity's Last Exam
self-reported

pending

View on Papers With Code