Qwen3.5-4B-UD-japanese-imatrix developed by dahara1@webbigdata

私の知る限り、数あるQwen3.5-4B GGUFの中で最も日本語能力に特化させた世界一のモデル
To my knowledge, among the many Qwen3.5-4B GGUF models, this is the world's best model, specifically designed for Japanese language proficiency.

特徴 / Features

一言で言えば沢山の細かい改善をして出来上がった強力なggufモデルです。
In short, it's a powerful small gguf model with many improvements.

このggufの特徴

コミュニティが過去に発見した不具合を適用して誤作動割合を減らしています
UnslothのDynamic Quantization 2.0形式を採用しています
日本語が大目のキャリブレーションデータを使用しています

Features of this gguf

We've applied bugs previously discovered by the community to reduce the rate of malfunctions.
This model uses Unsloth's Dynamic Quantization 2.0 format.
Use calibration data with a large amount of Japanese text.

動かし方 / How to Run

GPUがなくても動きますが、システムメモリは8GB以上、ディスク容量が3GB以上必要です。
It will run without a GPU, but you will need at least 8GB of system memory and 3GB of disk space.

Linux terminalでの実行

llama.cppを使います。直近でQwen3.5対応のアップデートがいくつかあったため、最新版を使う事をおすすめします。(本件の動作確認はversion: 8007 (098595411)で行っています)
We will be using llama.cpp. Since there have been several recent updates to support Qwen 3.5, we recommend using the latest version. (This issue was confirmed to work with version: 8007 (098595411)).

llama.cppからお使いのハードウェア用のZIPファイルをダウンロードして設定します。
沢山種類があるので迷うかもしれませんが、chatGPTなりGeminiなりCaludeなりに聞いて適切なものを選んでください
Download the zip file for your hardware from llama.cpp and set it up.
There are many options, so you may be confused, but please ask chatGPT, Gemini, or Calude to help you choose the right one.

ダウンロードしたzipを解凍後し、ターミナル、PowerShell、端末から以下のコマンドを打ち込んで起動します
After unzipping the downloaded zip file, run it via Terminal, PowerShell, or the terminal by typing the following command.

Linuxでのターミナルでの実行例です
Here is an example of running the command on Linux terminal:

まずhf commandをインストールしてください
First, please install the hf command.

# モデルのダウンロード / download model
hf download dahara1/Qwen3.5-4B-UD-japanese-imatrix Qwen3.5-4B-UD-Q4_K_XL.gguf --local-dir Qwen3.5-4B-UD-japanese-imatrix
# 念の為jinjaテンプレートのダウンロード / download jinja template
hf download dahara1/Qwen3.5-4B-UD-japanese-imatrix chat_template.jinja --local-dir Qwen3.5-4B-UD-japanese-imatrix

./llama-cli \
  -m Qwen3.5-4B-UD-japanese-imatrix/Qwen3.5-4B-UD-Q4_K_XL.gguf \
  --temp 0.6 \
  --top-p 0.8 \
  --top-k 20 \
  --min-p 0.0 \
  --ctx-size 12000 \
  --presence_penalty 1.5 \
  --jinja \
  --chat-template-kwargs '{"enable_thinking":true}' \
  --chat-template-file Qwen3.5-4B-UD-japanese-imatrix/chat_template.jinja \
  -ub 2048 \
  -b 2048

ctx-sizeが扱える文章の長さです。長くすると複数ターンの長い会話も扱えるようになりますが、必要メモリ量も増えます。
ctx-size specifies the length of text that can be handled. Increasing this value allows for longer conversations with multiple turns, but it also increases the amount of memory required.

GPUをお持ちの方へ(for GPU User)

16GBのGPUメモリがあると比較的快適に動かす事ができます。上記のコマンドに-ngl 99を追加してください
If you have 16GB of GPU memory, it will run relatively smoothly. Add -ngl 99 to the above command.

Windows AMD CPU / iGPU 用の例

AMD Ryzen 9 7940HS w/ Radeon 780M Graphics システムメモリ32GBのミニPC、Vulkanセットアップ済み、GPUには8Gを割り当て済みのPCでのコマンド例
llama.cppはgithubより「Windows x64 (Vulkan)」をダウンロードします
-ngl 99 を付与すれば高速実行することができます

AMD Ryzen 9 7940HS w/ Radeon 780M Graphics Mini PC with 32GB of system memory, Vulkan setuped, and 8GB allocated to the GPU. Download "Windows x64 (Vulkan)" for llama.cpp from github.
You can run it faster by adding -ngl 99.

サンプルスクリプト / sample script

クライアント/サーバー型式でスクリプトでアクセスしたい場合は以下を参考にしてください
If you want to access it via script in a client/server format, please refer to the following:

server起動コマンド例

./llama-server \
  -m Qwen3.5-4B-UD-japanese-imatrix/Qwen3.5-4B-UD-Q4_K_XL.gguf \
  --host 0.0.0.0 \
  --port 8081 \
  --top-p 0.8 \
  --top-k 20 \
  --min-p 0.0 \
  --ctx-size 24000 \
  --presence_penalty 1.5 \
  --chat-template-kwargs '{"enable_thinking":true}' \
  --chat-template-file Qwen3.5-4B-UD-japanese-imatrix/chat_template.jinja \
  --jinja \
  -ub 2048 \
  -b 2048

ブラウザで、モデルを実行しているサーバーのローカルアドレス、ポートを指定して開いて下さい。例(http://127.0.0.1:8081/)
In your browser, open the local address and port of the server running the model. For example, http://127.0.0.1:8081/

client script sample

ツールを利用した検索AIエージェントのデモです
This is a demo of a search AI agent using a tool.

curlとgrepというUnix付属のコマンドを使っているため、Windowsの場合はWSLを利用してください
This uses the Unix-provided commands curl and grep, so Windows users should use WSL.

必要ライブラリのインストール/Installing necessary libraries

pip install bs4

import json
import sys
import time
import random
import re
import os
import argparse
import subprocess
from datetime import datetime
from openai import OpenAI
from bs4 import BeautifulSoup

# ============================================================
#  ツールチェーンデモ — AI調べもの代行エージェント
#  curl で Wikipedia 記事を取得 → grep でキーワード抽出 → 回答
#  ストリーミング + <think> 表示
# ============================================================

client = OpenAI(
    base_url="http://localhost:8081/v1",
    api_key="dummy"
)

# ============================================================
# 色定義
# ============================================================
class C:
    BOLD      = "\033[1m"
    DIM       = "\033[2m"
    RESET     = "\033[0m"
    CYAN      = "\033[96m"
    YELLOW    = "\033[93m"
    GREEN     = "\033[92m"
    BLUE      = "\033[94m"
    MAGENTA   = "\033[95m"
    RED       = "\033[91m"
    WHITE     = "\033[97m"
    BG_RED    = "\033[41m"
    BG_GREEN  = "\033[42m"
    BG_BLUE   = "\033[44m"
    BG_MAGENTA = "\033[45m"
    BG_CYAN   = "\033[46m"
    GRAY      = "\033[90m"

# ============================================================
# 記事URLマッピング
# ============================================================
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))

ARTICLE_MAP = {
    "dark_matter": {
        "url": "https://ja.wikipedia.org/wiki/暗黒物質",
        "title": "暗黒物質（ダークマター）",
    },
    "quorum_sensing": {
        "url": "https://ja.wikipedia.org/wiki/クオラムセンシング",
        "title": "クオラムセンシング",
    },
    "cambrian_explosion": {
        "url": "https://ja.wikipedia.org/wiki/カンブリア爆発",
        "title": "カンブリア爆発",
    },
}

URL_TO_KEY = {}
for key, info in ARTICLE_MAP.items():
    URL_TO_KEY[info["url"]] = key
    URL_TO_KEY[key] = key

# ============================================================
# 質問テンプレート
# ============================================================
QUESTIONS = [
    {
        "article_key": "dark_matter",
        "question": "ダークマターって宇宙の何パーセントくらいを占めてるの？あと、その存在はどうやって発見されたのか教えてほしい。",
        "thanks": "なるほど、銀河の回転曲線から見つかったんだね。めちゃくちゃ面白い！ありがとう、すごくわかりやすかった！",
    },
    {
        "article_key": "quorum_sensing",
        "question": "クオラムセンシングって何？細菌が会話するってどういうこと？具体的にどんな仕組みで集団行動してるのか知りたい。",
        "thanks": "すごい、細菌にもコミュニケーションの仕組みがあるんだね！イカとの共生の話も面白かった。ありがとう！",
    },
    {
        "article_key": "cambrian_explosion",
        "question": "カンブリア爆発ってよく聞くけど、実際に何が起きたの？全球凍結って何？",
        "thanks": "光スイッチ説とか全球凍結とか、いろんな仮説があるんだね。勉強になったよ、ありがとう！",
    },
]

# ============================================================
# 対応履歴ストレージ
# ============================================================
SUPPORT_LOG_FILE = os.path.join(SCRIPT_DIR, "support_log.json")


# ============================================================
# ツール実装
# ============================================================
def execute_curl(url):
    """curlでWikipedia記事を取得し、HTMLをプレーンテキストに変換して保存"""
    article_key = None
    for pattern, key in URL_TO_KEY.items():
        if pattern in url:
            article_key = key
            break

    if not article_key or article_key not in ARTICLE_MAP:
        return json.dumps({"error": f"記事が見つかりません: {url}"}, ensure_ascii=False)

    output_path = os.path.join("/tmp", f"curl_output_{article_key}.txt")

    # 本物のcurlでWikipediaからHTMLを取得
    try:
        curl_result = subprocess.run(
            ["curl", "-s", "-L", "--max-time", "30", url],
            capture_output=True, text=True, timeout=35
        )
        if curl_result.returncode != 0:
            return json.dumps({"error": f"curl失敗 (returncode={curl_result.returncode})"}, ensure_ascii=False)

        html = curl_result.stdout
        if not html:
            return json.dumps({"error": "curlで空のレスポンスが返されました"}, ensure_ascii=False)

        # BeautifulSoupでHTMLからプレーンテキストを抽出
        soup = BeautifulSoup(html, "html.parser")

        # 不要な要素を除去
        for tag in soup.find_all(["script", "style", "nav", "footer", "header", "noscript"]):
            tag.decompose()

        # 本文領域を取得（Wikipediaの記事本体は div#mw-content-text 内）
        content_div = soup.find("div", {"id": "mw-content-text"})
        if content_div:
            text = content_div.get_text(separator="\n")
        else:
            text = soup.get_text(separator="\n")

        # 空行の連続を整理
        lines = [line.strip() for line in text.splitlines()]
        text = "\n".join(line for i, line in enumerate(lines)
                         if line or (i > 0 and lines[i - 1]))

        # ファイルに保存
        with open(output_path, "w", encoding="utf-8") as f:
            f.write(text)

        # ファイルサイズと行数を取得
        wc_result = subprocess.run(["wc", "-l", "-c", output_path], capture_output=True, text=True, timeout=5)
        stat_line = wc_result.stdout.strip()
    except subprocess.TimeoutExpired:
        return json.dumps({"error": "curlがタイムアウトしました"}, ensure_ascii=False)
    except Exception as e:
        return json.dumps({"error": str(e)}, ensure_ascii=False)

    return json.dumps({
        "url": url,
        "status_code": 200,
        "saved_to": output_path,
        "file_stats": stat_line,
        "message": f"記事を {output_path} に保存しました。execute_grep で必要な情報を検索してください。",
    }, ensure_ascii=False)


def execute_grep(keyword, file_path, context_lines=3):
    """本物のgrepを実行"""
    # ファイル存在チェック＆補正
    if not os.path.exists(file_path):
        # /tmp/curl_output_*.txt を探す
        for f in os.listdir("/tmp"):
            if f.startswith("curl_output_") and f.endswith(".txt"):
                file_path = os.path.join("/tmp", f)
                break

    try:
        result = subprocess.run(
            ["grep", "-n", f"-C{context_lines}", keyword, file_path],
            capture_output=True, text=True, timeout=5
        )
        output = result.stdout
        if not output:
            return json.dumps({
                "keyword": keyword,
                "file": file_path,
                "matches": 0,
                "output": f"キーワード '{keyword}' は見つかりませんでした。",
            }, ensure_ascii=False)

        return json.dumps({
            "keyword": keyword,
            "file": file_path,
            "matches": output.count(keyword),
            "output": output,
        }, ensure_ascii=False)
    except Exception as e:
        return json.dumps({"error": str(e)}, ensure_ascii=False)


def register_support_log(user_id, category, summary, resolved):
    """対応履歴を登録"""
    log_entry = {
        "id": f"LOG-{datetime.now().strftime('%Y%m%d%H%M%S')}",
        "timestamp": datetime.now().isoformat(),
        "user_id": user_id,
        "category": category,
        "summary": summary,
        "resolved": resolved,
    }

    logs = []
    if os.path.exists(SUPPORT_LOG_FILE):
        with open(SUPPORT_LOG_FILE, "r") as f:
            logs = json.load(f)

    logs.append(log_entry)

    with open(SUPPORT_LOG_FILE, "w") as f:
        json.dump(logs, f, ensure_ascii=False, indent=2)

    return json.dumps({"status": "registered", "log_entry": log_entry}, ensure_ascii=False)


def execute_tool(func_name, args):
    if func_name == "execute_curl":
        return execute_curl(args.get("url", ""))
    elif func_name == "execute_grep":
        return execute_grep(
            args.get("keyword", ""),
            args.get("file_path", ""),
            args.get("context_lines", 3),
        )
    elif func_name == "register_support_log":
        return register_support_log(
            args.get("user_id", "anonymous"),
            args.get("category", ""),
            args.get("summary", ""),
            args.get("resolved", True),
        )
    else:
        return json.dumps({"error": f"Unknown tool: {func_name}"}, ensure_ascii=False)


# ============================================================
# ツール定義
# ============================================================
tools = [
    {
        "type": "function",
        "function": {
            "name": "execute_curl",
            "description": (
                "指定したURLに対してcurlコマンドを実行し、Webページの内容を取得します。"
                "取得した内容はローカルファイルに保存されます。"
                "レスポンスの saved_to フィールドに保存先ファイルパスが含まれます。"
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "取得対象のURL"}
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "execute_grep",
            "description": (
                "指定したファイルに対してgrepコマンドを実行し、"
                "キーワードを含む行とその前後の行（デフォルト3行）を取得します。"
                "curlで取得したファイルに対して使い、長い記事から必要な情報を効率的に抽出します。"
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "keyword": {"type": "string", "description": "検索キーワード"},
                    "file_path": {"type": "string", "description": "検索対象のファイルパス（curlのsaved_toの値）"},
                    "context_lines": {"type": "integer", "description": "キーワード前後の表示行数（デフォルト: 3）"},
                },
                "required": ["keyword", "file_path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "register_support_log",
            "description": (
                "ユーザー対応が完了した後に、対応履歴をシステムに登録します。"
                "ユーザーが退出（お礼を言って会話終了）した後に必ず呼び出してください。"
                "対応内容のサマリー、カテゴリ、解決状況を記録します。"
                "同一問い合わせを複数回記録してはいけません"
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "user_id": {"type": "string", "description": "ユーザーID"},
                    "category": {"type": "string", "description": "質問カテゴリ（例: 宇宙物理, 微生物学, 古生物学）"},
                    "summary": {"type": "string", "description": "対応内容のサマリー（何を質問され、どう回答したかの要約）"},
                    "resolved": {"type": "boolean", "description": "解決済みかどうか"},
                },
                "required": ["user_id", "category", "summary", "resolved"]
            }
        }
    },
]


# ============================================================
# ストリーミング表示
# ============================================================
def stream_response(messages, debug=False, silent=False):
    if debug and not silent:
        print(f"\n  {C.YELLOW}⏳ LLM呼び出し中 (streaming)...{C.RESET}")

    try:
        stream = client.chat.completions.create(
            model="qwen3.5",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            temperature=0.6,
            stream=True,
        )
    except Exception as e:
        if not silent:
            print(f"\n  {C.BG_RED}{C.WHITE} ❌ API ERROR {C.RESET}")
            print(f"  {C.RED}{type(e).__name__}: {e}{C.RESET}")
        return None, None, None, "error"

    full_content = ""
    full_reasoning = ""
    tool_calls_map = {}
    finish_reason = None
    in_reasoning = False
    in_content = False

    for chunk in stream:
        delta = chunk.choices[0].delta if chunk.choices else None
        if not delta:
            continue
        if chunk.choices[0].finish_reason:
            finish_reason = chunk.choices[0].finish_reason

        # reasoning_content（思考部分）の表示
        reasoning_text = getattr(delta, "reasoning_content", None)
        if reasoning_text:
            full_reasoning += reasoning_text
            if not silent:
                if not in_reasoning:
                    in_reasoning = True
                    print(f"\n  {C.MAGENTA}{C.DIM}💭 <think>{C.RESET}")
                    print(f"  {C.MAGENTA}{C.DIM}", end="", flush=True)
                print(f"{C.MAGENTA}{C.DIM}{reasoning_text}{C.RESET}", end="", flush=True)
            else:
                in_reasoning = True

        # content（回答部分）の表示
        if delta.content:
            text = delta.content
            full_content += text
            if not silent:
                if in_reasoning:
                    in_reasoning = False
                    print(f"{C.RESET}")
                    print(f"  {C.MAGENTA}{C.DIM}💭 </think>{C.RESET}")
                if not in_content:
                    in_content = True
                    print(f"\n  {C.WHITE}💬 ", end="", flush=True)
                print(f"{C.WHITE}{text}{C.RESET}", end="", flush=True)
            else:
                in_reasoning = False
                in_content = True

        if delta.tool_calls:
            for tc_delta in delta.tool_calls:
                idx = tc_delta.index
                if idx not in tool_calls_map:
                    tool_calls_map[idx] = {"id": tc_delta.id or "", "name": "", "arguments": ""}
                if tc_delta.id:
                    tool_calls_map[idx]["id"] = tc_delta.id
                if tc_delta.function:
                    if tc_delta.function.name:
                        tool_calls_map[idx]["name"] = tc_delta.function.name
                    if tc_delta.function.arguments:
                        tool_calls_map[idx]["arguments"] += tc_delta.function.arguments

    if not silent:
        # reasoning だけで終わった場合（contentが来なかった場合）の閉じ処理
        if in_reasoning:
            print(f"{C.RESET}")
            print(f"  {C.MAGENTA}{C.DIM}💭 </think>{C.RESET}")
        if in_reasoning or in_content:
            print(f"{C.RESET}")

    tool_calls_list = [tool_calls_map[idx] for idx in sorted(tool_calls_map.keys())]
    return full_content, full_reasoning, tool_calls_list, finish_reason


# ============================================================
# デバッグ用
# ============================================================
def dump_messages_summary(messages):
    print(f"\n  {C.DIM}{'─'*50}{C.RESET}")
    print(f"  {C.DIM}📋 メッセージ履歴: {len(messages)} 件{C.RESET}")
    for i, msg in enumerate(messages):
        role = msg.get("role", "?")
        content = msg.get("content", "")
        content_len = len(content) if isinstance(content, str) else 0
        has_tc = "tool_calls" in msg
        name = msg.get("name", "")
        tool_call_id = msg.get("tool_call_id", "")

        if role == "system":
            print(f"  {C.DIM}  [{i}] system: ({content_len}文字){C.RESET}")
        elif role == "user":
            preview = (content[:40] + "...") if content_len > 40 else content
            print(f"  {C.DIM}  [{i}] user: \"{preview}\" ({content_len}文字){C.RESET}")
        elif role == "assistant":
            tc_info = ""
            if has_tc:
                tc_names = [tc.get("function", {}).get("name", "?") for tc in msg["tool_calls"]]
                tc_info = f" + tool_calls: [{', '.join(tc_names)}]"
            print(f"  {C.DIM}  [{i}] assistant: ({content_len}文字){tc_info}{C.RESET}")
        elif role == "tool":
            tid = tool_call_id[:16] + "..." if len(tool_call_id) > 16 else tool_call_id
            print(f"  {C.DIM}  [{i}] tool({name}): ({content_len}文字) id={tid}{C.RESET}")
    print(f"  {C.DIM}{'─'*50}{C.RESET}")


# ============================================================
# メインループ
# ============================================================
def main():
    parser = argparse.ArgumentParser(description="ツールチェーンデモ")
    parser.add_argument("--debug", action="store_true", help="デバッグ情報を表示")
    args = parser.parse_args()
    debug = args.debug

    scenario = random.choice(QUESTIONS)
    article_info = ARTICLE_MAP[scenario["article_key"]]

    print(f"\n{C.BOLD}{C.CYAN}{'='*62}{C.RESET}")
    print(f"{C.BOLD}{C.CYAN}  🔗 ツールチェーンデモ — AI調べもの代行エージェント{C.RESET}")
    if debug:
        print(f"{C.BOLD}{C.YELLOW}  🔍 デバッグモード ON{C.RESET}")
    print(f"{C.BOLD}{C.CYAN}{'='*62}{C.RESET}")
    print(f"\n  {C.DIM}📚 今回の記事: {article_info['title']}{C.RESET}")
    print(f"  {C.DIM}🔗 URL: {article_info['url']}{C.RESET}\n")

    article_list = "\n".join([
        f"  - {info['title']}: {info['url']}"
        for info in ARTICLE_MAP.values()
    ])

    system_prompt = (
        "あなたはユーザーの質問に答える調べもの代行AIエージェントです。\n"
        "以下の手順で対応してください：\n\n"
        "【手順】\n"
        "1. ユーザーの質問内容を理解し、関連するWikipedia記事のURLを特定する\n"
        "2. execute_curl ツールでWikipedia記事を取得する\n"
        "3. execute_grep ツールで質問に関連するキーワードを検索し、必要な情報を抽出する\n"
        "   - ユーザーの質問に含まれる不明な単語や概念ごとにgrepを実行してください\n"
        "   - 例：質問が「カンブリア爆発」と「全球凍結」に触れていたら、それぞれ別のgrepで検索する\n"
        "   - grepの file_path には、curlの結果に含まれる saved_to の値を使ってください\n"
        "4. 抽出した情報を元に、わかりやすく回答する\n"
        "5. ユーザーがお礼を言って退出したら、register_support_log ツールで対応履歴を1回だけ登録する\n\n"
        "【重要なルール】\n"
        "- 必ず execute_curl → execute_grep（複数回）→ 回答 の順で進めてください\n"
        "- grepは質問に含まれるトピックごとに実行し、幅広く情報を収集してください\n"
        "- 回答はユーザーにわかりやすい日本語で、根拠となる情報を含めてください\n"
        "- register_support_log は対応全体のサマリーを1回だけ登録してください。複数回呼ばないでください\n"
        "- ツールは並列呼び出しが可能です。同時に実行できるものはまとめてください\n\n"
        "【利用可能なWikipedia記事】\n"
        f"{article_list}\n"
    )

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": scenario["question"]},
    ]

    print(f"  {C.BLUE}👤 ユーザー:{C.RESET}")
    print(f"  {C.BLUE}  「{scenario['question']}」{C.RESET}\n")

    step = 0
    max_steps = 15
    user_thanked = False

    while step < max_steps:
        step += 1
        print(f"\n{C.BOLD}{C.WHITE}{C.BG_BLUE} STEP {step} {C.RESET}")

        if debug:
            dump_messages_summary(messages)

        full_content, full_reasoning, tool_calls_list, finish_reason = stream_response(messages, debug)

        if debug:
            print(f"\n  {C.DIM}📝 RAW full_content:{C.RESET}")
            print(f"  {C.DIM}{repr(full_content)}{C.RESET}")
            print(f"  {C.DIM}📝 RAW full_reasoning:{C.RESET}")
            print(f"  {C.DIM}{repr(full_reasoning)}{C.RESET}")

        if full_content is None:
            break

        if debug:
            print(f"\n  {C.CYAN}  finish_reason: {C.BOLD}{finish_reason}{C.RESET}")

        # ツール呼び出しなし → テキスト応答
        if not tool_calls_list:
            if not user_thanked:
                user_thanked = True
                print(f"\n  {C.BLUE}👤 ユーザー:{C.RESET}")
                print(f"  {C.BLUE}  「{scenario['thanks']}」{C.RESET}")
                messages.append({"role": "assistant", "content": full_content or ""})
                messages.append({"role": "user", "content": scenario["thanks"]})
                continue
            else:
                # お礼への返答は表示済み。裏でチケット登録して終了
                has_logged = any(
                    m.get("name") == "register_support_log"
                    for m in messages if m.get("role") == "tool"
                )
                if not has_logged:
                    messages.append({"role": "assistant", "content": full_content or ""})
                    messages.append({
                        "role": "user",
                        "content": "(システム通知: ユーザーが退出しました。対応履歴を register_support_log で登録してください。テキスト応答は不要です。)"
                    })
                    # チケット登録のためにもう1回LLMを呼ぶ（表示は抑制）
                    _, _, bg_tool_calls, _ = stream_response(messages, debug=False, silent=True)
                    if bg_tool_calls:
                        for tc in bg_tool_calls:
                            func_name = tc["name"]
                            try:
                                tc_args = json.loads(tc["arguments"])
                            except json.JSONDecodeError:
                                tc_args = {}
                            if func_name == "register_support_log":
                                print(f"\n  {C.YELLOW}📝 register_support_log{C.RESET}")
                                for k, v in tc_args.items():
                                    val_str = str(v)[:60]
                                    print(f"  {C.CYAN}   {k}: {val_str}{C.RESET}")
                                result = execute_tool(func_name, tc_args)
                                result_obj = json.loads(result)
                                log = result_obj.get("log_entry", {})
                                print(f"  {C.GREEN}   ✅ 登録完了: {log.get('id', '?')}{C.RESET}")
                print(f"\n  {C.GREEN}✅ 全タスク完了{C.RESET}")
                break

        # アシスタントメッセージを履歴追加
        assistant_msg = {"role": "assistant", "content": full_content or ""}
        assistant_msg["tool_calls"] = [
            {
                "id": tc["id"],
                "type": "function",
                "function": {"name": tc["name"], "arguments": tc["arguments"]},
            }
            for tc in tool_calls_list
        ]
        messages.append(assistant_msg)

        # ツール実行
        print(f"\n  {C.YELLOW}{'─'*50}{C.RESET}")
        print(f"  {C.YELLOW}⚡ ツール実行: {len(tool_calls_list)}件{C.RESET}")

        support_log_done = False  # 1回のステップ内での重複防止

        for tc in tool_calls_list:
            func_name = tc["name"]
            try:
                tc_args = json.loads(tc["arguments"])
            except json.JSONDecodeError:
                tc_args = {}

            # register_support_log の重複実行ガード
            if func_name == "register_support_log":
                # このステップ内で既に実行済み、または過去に実行済みならスキップ
                already_logged = support_log_done or any(
                    m.get("name") == "register_support_log"
                    for m in messages if m.get("role") == "tool"
                )
                if already_logged:
                    print(f"\n    {C.YELLOW}📝 register_support_log{C.RESET}")
                    print(f"    {C.DIM}   ⏭️  既に登録済みのためスキップ{C.RESET}")
                    # LLMにはスキップした旨を返す
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc["id"],
                        "name": func_name,
                        "content": json.dumps({"status": "skipped", "reason": "対応履歴は既に登録済みです"}, ensure_ascii=False),
                    })
                    continue

            # 実行コマンドの表示
            if func_name == "execute_curl":
                url = tc_args.get("url", "")
                print(f"\n    {C.YELLOW}🌐 curl{C.RESET}")
                print(f"    {C.GRAY}$ curl -s \"{url}\" -o /tmp/curl_output_*.txt{C.RESET}")
            elif func_name == "execute_grep":
                kw = tc_args.get("keyword", "")
                fp = tc_args.get("file_path", "")
                ctx = tc_args.get("context_lines", 3)
                print(f"\n    {C.YELLOW}🔍 grep{C.RESET}")
                print(f"    {C.GRAY}$ grep -n -C{ctx} \"{kw}\" {fp}{C.RESET}")
            elif func_name == "register_support_log":
                print(f"\n    {C.YELLOW}📝 register_support_log{C.RESET}")
                for k, v in tc_args.items():
                    val_str = str(v)[:60]
                    print(f"    {C.CYAN}   {k}: {val_str}{C.RESET}")

            result = execute_tool(func_name, tc_args)
            result_obj = json.loads(result)

            # 結果表示
            if func_name == "execute_curl":
                if "saved_to" in result_obj:
                    print(f"    {C.GREEN}   ✅ 取得成功{C.RESET}")
                    print(f"    {C.GREEN}   💾 保存先: {result_obj['saved_to']}{C.RESET}")
                    print(f"    {C.GREEN}   📊 {result_obj.get('file_stats', '')}{C.RESET}")
                else:
                    print(f"    {C.RED}   ❌ {result_obj.get('error', '不明なエラー')}{C.RESET}")

            elif func_name == "execute_grep":
                matches = result_obj.get("matches", 0)
                output = result_obj.get("output", "")
                print(f"    {C.GREEN}   ✅ {matches}件マッチ{C.RESET}")
                lines = output.split("\n")
                for line in lines[:10]:
                    print(f"    {C.DIM}   {line}{C.RESET}")
                if len(lines) > 10:
                    print(f"    {C.DIM}   ... (他 {len(lines)-10} 行){C.RESET}")

            elif func_name == "register_support_log":
                log = result_obj.get("log_entry", {})
                print(f"    {C.GREEN}   ✅ 登録完了: {log.get('id', '?')}{C.RESET}")
                support_log_done = True

            messages.append({
                "role": "tool",
                "tool_call_id": tc["id"],
                "name": func_name,
                "content": result,
            })

        print(f"  {C.YELLOW}{'─'*50}{C.RESET}")

    # サマリー
    print(f"\n{C.DIM}{'='*62}{C.RESET}")
    print(f"{C.DIM}最終メッセージ数: {len(messages)}{C.RESET}")
    print(f"{C.DIM}最終ステップ: {step}{C.RESET}")

    if os.path.exists(SUPPORT_LOG_FILE):
        with open(SUPPORT_LOG_FILE, "r") as f:
            logs = json.load(f)
        print(f"{C.DIM}対応履歴: {len(logs)}件 登録済み{C.RESET}")
        for log in logs:
            summary = log['summary'][:50] + "..." if len(log['summary']) > 50 else log['summary']
            print(f"{C.DIM}  📋 [{log['id']}] {log['category']} - {summary}{C.RESET}")

    print(f"{C.DIM}{'='*62}{C.RESET}\n")


if __name__ == "__main__":
    main()

ベンチマーク結果/benchmark result

shisa-ai/M-IFEval を使って計測した日本語における指示追従性能は以下です。
Ability to follow Japanese instructions measured using shisa-ai/M-IFEval is as follows.

Unslothもbartowskiも量子化モデルで世界的に有名であるため、今回、彼らのモデルに挑戦しました。
英語をメインに使用する場合はお二人のモデルの方が性能が高いと思われるので留意してください。

Since both Unsloth and Bartowski are world-renowned experts in quantization models, I decided to try their models this time.
Please note that their models are likely to perform better if you primarily use English.

Model Name	Strict Prompt	Strict Inst	Loose Prompt	Loose Inst
Unsloth-Q4_K_XL	0.6221	0.6681	0.6512	0.6947
bartowski_Q4_K_L	0.5523	0.6239	0.5930	0.6637
Qwen3.5-4B-UD-japanese-imatrix-Q4_K_XL	0.6395	0.7124	0.6686	0.7345

update

2026/04/09 fix prompt template for cache reuse issue

謝辞 / Acknowledgments

Qwen
Unsloth
bartowski
llama.cpp
Thank you to all AI researchers and practitioners.

作成者 / Developer

開発：dahara1@Webbigdata / Developed by dahara1@Webbigdata

Downloads last month: 14,580

GGUF

Model size

4B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dahara1/Qwen3.5-4B-UD-japanese-imatrix

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

unsloth/Qwen3.5-4B

Quantized

(9)

this model