Mistral-Small-3.1-24B-Base (text-only version)

Model Description

This repository contains the language model weights of mistralai/Mistral-Small-3.1-24B-Base-2503, extracted to function as a standalone text-generation model (MistralForCausalLM).

The original Mistral-Small-3.1-24B-Base-2503 is a Vision Language Model (VLM). This version has had its vision components removed, resulting in a text-only model that can be treated similarly to mistralai/Mistral-Small-24B-Base-2501.

This model serves as a convenient starting point for anyone looking to fine-tune the language capabilities of Mistral-Small-3.1 without the overhead of the vision components.

Usage

You can use this model directly with the AutoModelForCausalLM class from the transformers library.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "kawaimasa/mistral-small-3.1-base-no-vison"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "The capital of Japan is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

How This Model Was Created

This LLM was extracted from the original VLM using the script below. This method cleanly separates the language components (language_model and lm_head) and reconstructs them into a standard MistralForCausalLM model.

The same process can be applied to the instruct-tuned version, mistralai/Mistral-Small-3.1-24B-Instruct-2503, to create a pure LLM for chat and instruction-following tasks.

show program(translate to english)

# VLM_to_LLM_converter.py
from transformers import (
    Mistral3ForConditionalGeneration,
    MistralForCausalLM,
    AutoTokenizer
)
import torch
import os

# --- Configuration ---
source_vlm_name = "mistralai/Mistral-Small-3.1-24B-Base-2503"
output_dir = "./mistral-small-3.1-base-no-vision"
target_dtype = torch.bfloat16

# --- Main Script ---
os.makedirs(output_dir, exist_ok=True)
print(f"Starting conversion: {source_vlm_name}")
print(f"Output directory: {output_dir}")
print(f"Target dtype: {target_dtype}")
print("==========================================================")

# 1. Load the VLM and extract language components
print(f"Loading VLM model from {source_vlm_name}...")
vlm_model = Mistral3ForConditionalGeneration.from_pretrained(
    source_vlm_name,
    torch_dtype=target_dtype,
    low_cpu_mem_usage=True
)

# Get the language model, lm_head, and config
language_model_part = vlm_model.model.language_model
lm_head_part = vlm_model.lm_head
config = language_model_part.config

print("✅ Extracted `language_model` and `lm_head`.")
del vlm_model
print("🗑️ VLM model released from memory.")

# 2. Create a new LLM and transplant the components
print("\nCreating a new MistralForCausalLM instance and transplanting components...")
new_llm = MistralForCausalLM(config)

# Transplant the language model body
new_llm.model = language_model_part
print("   - Transplanted `language_model`.")

# Transplant the language model head
new_llm.lm_head = lm_head_part
print("   - Transplanted `lm_head`.")

# Ensure the final model is in the target dtype
new_llm = new_llm.to(target_dtype)
print(f"✅ Model assembly complete (dtype: {new_llm.dtype})")

# 3. Save the final LLM and tokenizer
print(f"\nSaving the final model to {output_dir}...")
new_llm.save_pretrained(output_dir)

tokenizer = AutoTokenizer.from_pretrained(source_vlm_name)
tokenizer.save_pretrained(output_dir)
print("✅ Model and tokenizer saved.")

print("\n==========================================================")
print("🎉 Process completed successfully! 🎉")
print(f"Pure LLM version of the model is saved in {output_dir}")
print("==========================================================")

original program

from transformers import (
    Mistral3ForConditionalGeneration,
    MistralForCausalLM,
    AutoTokenizer
)
import torch
import os

# --- 入力と出力の設定 ---
source_vlm_name = "mistralai/Mistral-Small-3.1-24B-Base-2503"
output_dir = "./mistral-small-3.1-base-no-vison"
target_dtype = torch.bfloat16 # データ型をbfloat16に指定

os.makedirs(output_dir, exist_ok=True)
print(f"構築開始: {source_vlm_name} から純正LLMを構築します。")
print(f"出力先: {output_dir}")
print(f"データ型: {target_dtype}")
print("==========================================================")


# 1. 2503 VLMから language_model と lm_head を抽出
print(f"Loading VLM model from {source_vlm_name}...")
# モデル全体をbfloat16でロード
vlm_model = Mistral3ForConditionalGeneration.from_pretrained(
    source_vlm_name,
    torch_dtype=target_dtype,
    low_cpu_mem_usage=True
)

# 純正の部品を取得
language_model_part = vlm_model.model.language_model
lm_head_part = vlm_model.lm_head
config = language_model_part.config # configはlanguage_modelから取得

print("✅ `language_model` と `lm_head` を抽出しました。")
del vlm_model # メモリ解放
print("🗑️ VLMモデルをメモリから解放しました。")


# 2. 新しいLLMの器を作成し、純正部品を移植
print("\nCreating a new MistralForCausalLM instance and transplanting components...")
# configから空のLLMを作成（この時点ではlm_headはランダム）
new_llm = MistralForCausalLM(config)

# language_modelを丸ごと移植
new_llm.model = language_model_part
print("   - `language_model` を移植しました。")

# lm_headを丸ごと移植
new_llm.lm_head = lm_head_part
print("   - `lm_head` を移植しました。")

# モデル全体のデータ型を最終確認
new_llm = new_llm.to(target_dtype)
print(f"✅ モデルの準備が完了しました (dtype: {new_llm.dtype})")


# 3. 完成したモデルとトークナイザーを保存
print(f"\nSaving the final model to {output_dir}...")
new_llm.save_pretrained(output_dir)

tokenizer = AutoTokenizer.from_pretrained(source_vlm_name)
tokenizer.save_pretrained(output_dir)
print("✅ モデルとトークナイザーの保存が完了しました。")

print("\n==========================================================")
print("🎉 全てのプロセスが完了しました！ 🎉")
print(f"純粋な `Mistral-Small-3.1-24B-Base-2503` LLMが {output_dir} に作成されました。")
print("==========================================================")

Downloads last month: 182

Safetensors

Model size

24B params

Tensor type

BF16

Model tree for kawaimasa/Mistral-Small-3.1-Base-2503-no-vison

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

(26)

this model