Transformers documentation
모델
모델
기본 클래스 PreTrainedModel, TFPreTrainedModel, FlaxPreTrainedModel는 로컬 파일과 디렉토리로부터 모델을 로드하고 저장하거나 또는 (허깅페이스 AWS S3 리포지토리로부터 다운로드된) 라이브러리에서 제공하는 사전 훈련된 모델 설정을 로드하고 저장하는 것을 지원하는 기본 메소드를 구현하였습니다.
PreTrainedModel과 TFPreTrainedModel은 또한 모든 모델들을 공통적으로 지원하는 메소드 여러개를 구현하였습니다:
- 새 토큰이 단어장에 추가될 때, 입력 토큰 임베딩의 크기를 조정합니다.
- 모델의 어텐션 헤드를 가지치기합니다.
각 모델에 공통인 다른 메소드들은 다음의 클래스에서 정의됩니다.
- [~modeling_utils.ModuleUtilsMixin](파이토치 모델용)
- 텍스트 생성을 위한 [~modeling_tf_utils.TFModuleUtilsMixin](텐서플로 모델용)
- [~generation.GenerationMixin](파이토치 모델용)
- [~generation.FlaxGenerationMixin](Flax/JAX 모델용)
PreTrainedModel
Base class for all models.
PreTrainedModel takes care of storing the configuration of the models and handles methods for loading, downloading and saving models as well as a few methods common to all models to:
- resize the input embeddings
Class attributes (overridden by derived classes):
- config_class (PreTrainedConfig) — A subclass of PreTrainedConfig to use as configuration class for this model architecture.
- base_model_prefix (str) — A string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model.
- main_input_name (str) — The name of the principal input to the model (ofteninput_idsfor NLP models,pixel_valuesfor vision models andinput_valuesfor speech models).
- can_record_outputs (dict):
push_to_hub
< source >( repo_id: str use_temp_dir: bool | None = None commit_message: str | None = None private: bool | None = None token: bool | str | None = None max_shard_size: int | str | None = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str | None = None commit_description: str | None = None tags: list[str] | None = None **deprecated_kwargs )
Parameters
-  repo_id (str) — The name of the repository you want to push your model to. It should contain your organization name when pushing to a given organization.
-  use_temp_dir (bool, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default toTrueif there is no directory named likerepo_id,Falseotherwise.
-  commit_message (str, optional) — Message to commit while pushing. Will default to"Upload model".
-  private (bool, optional) — Whether to make the repo private. IfNone(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists.
-  token (boolorstr, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, will use the token generated when runninghf auth login(stored in~/.huggingface). Will default toTrueifrepo_urlis not specified.
-  max_shard_size (intorstr, optional, defaults to"5GB") — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"). We default it to"5GB"so that users can easily load models on free-tier Google Colab instances without any CPU OOM issues.
-  create_pr (bool, optional, defaults toFalse) — Whether or not to create a PR with the uploaded files or directly commit.
-  safe_serialization (bool, optional, defaults toTrue) — Whether or not to convert the model weights in safetensors format for safer serialization.
-  revision (str, optional) — Branch to push the uploaded files to.
-  commit_description (str, optional) — The description of the commit that will be created
-  tags (list[str], optional) — List of tags to push on the Hub.
Upload the model file to the 🤗 Model Hub.
Examples:
from transformers import AutoModel
model = AutoModel.from_pretrained("google-bert/bert-base-cased")
# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")
# Push the model to an organization with the name "my-finetuned-bert".
model.push_to_hub("huggingface/my-finetuned-bert")add_model_tags
< source >( tags: typing.Union[list[str], str] )
Add custom tags into the model that gets pushed to the Hugging Face Hub. Will not overwrite existing tags in the model.
can_generate
< source >(  ) → bool
Returns
bool
Whether this model can generate sequences with .generate().
Returns whether this model can generate sequences with .generate() from the GenerationMixin.
Under the hood, on classes where this function returns True, some generation-specific changes are triggered:
for instance, the model instance will have a populated generation_config attribute.
Potentially dequantize the model in case it has been quantized by a quantization method that support dequantization.
Removes the _require_grads_hook.
Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping the model weights fixed.
from_pretrained
< source >( pretrained_model_name_or_path: typing.Union[str, os.PathLike, NoneType] *model_args config: typing.Union[transformers.configuration_utils.PreTrainedConfig, str, os.PathLike, NoneType] = None cache_dir: typing.Union[str, os.PathLike, NoneType] = None ignore_mismatched_sizes: bool = False force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' use_safetensors: typing.Optional[bool] = None weights_only: bool = True **kwargs )
Parameters
-  pretrained_model_name_or_path (stroros.PathLike, optional) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g., ./my_model_directory/.
- Noneif you are both providing the configuration and state dictionary (resp. with keyword arguments- configand- state_dict).
 
-  model_args (sequence of positional arguments, optional) —
All remaining positional arguments will be passed to the underlying model’s __init__method.
-  config (Union[PreTrainedConfig, str, os.PathLike], optional) — Can be either:- an instance of a class derived from PreTrainedConfig,
- a string or path valid as input to from_pretrained().
 Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when: - The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
 
-  state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option. 
-  cache_dir (Union[str, os.PathLike], optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
-  ignore_mismatched_sizes (bool, optional, defaults toFalse) — Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model (if for instance, you are instantiating a model with 10 labels from a checkpoint with 3 labels).
-  force_download (bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
-  proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
-  output_loading_info(bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
-  local_files_only(bool, optional, defaults toFalse) — Whether or not to only look at local files (i.e., do not try to download the model).
-  token (strorbool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, or not specified, will use the token generated when runninghf auth login(stored in~/.huggingface).
-  revision (str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git.To test a pull request you made on the Hub, you can pass revision="refs/pr/<pr_number>".
-  attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention),"flash_attention_2"(using Dao-AILab/flash-attention), or"flash_attention_3"(using Dao-AILab/flash-attention/hopper). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.Accept HF kernel references in the form: / [@ ][: ] - 
and are any non-"/" and non-":" sequences. 
- “@” is optional (branch, tag, or commit-ish), e.g. “@main”, “@v1.2.0”, “@abc123”. 
- ”:” is optional and selects a function inside the kernel repo. 
- Both options can appear together and in this order only: @revision first, then :kernel_name.
- We intentionally allow a leading ”|” prefix (e.g., “flash|…”) because the code strips it before loading; ’|’ is not excluded in the character classes here. 
 Examples that match: “org/model” “org/model@main” “org/model:custom_kernel” “org/[email protected]:custom_kernel” 
- 
Parameters for big model inference
-  dtype (strortorch.dtype, optional) — Override the defaulttorch_dtypeand load the model under a specificdtype. The different options are:- 
torch.float16ortorch.bfloat16ortorch.float: load in a specifieddtype, ignoring the model’sconfig.dtypeif one exists. If not specified- the model will get loaded in torch.float(fp32).
 
- the model will get loaded in 
- 
"auto"- Adtypeortorch_dtypeentry in theconfig.jsonfile of the model will be attempted to be used. If this entry isn’t found then next check thedtypeof the first weight in the checkpoint that’s of a floating point type and use that asdtype. This will load the model using thedtypeit was saved in at the end of the training. It can’t be used as an indicator of how the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32.
- 
A string that is a valid torch.dtype. E.g. “float32” loads the model intorch.float32, “float16” loads intorch.float16etc.
 For some models the dtypethey were trained in is unknown - you may try to check the model’s paper or reach out to the authors and ask them to add this information to the model’s card and to insert thedtypeortorch_dtypeentry inconfig.jsonon the hub.
- 
-  device_map (strordict[str, Union[int, str, torch.device]]orintortorch.device, optional) — A map that specifies where each submodule should go. It doesn’t need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device. If we only pass the device (e.g.,"cpu","cuda:1","mps", or a GPU ordinal rank like1) on which the model will be allocated, the device map will map the entire model to this device. Passingdevice_map = 0means put the whole model on GPU 0.To have Accelerate compute the most optimized device_mapautomatically, setdevice_map="auto". For more information about each option see designing a device map.
-  max_memory (Dict, optional) — A dictionary device identifier to maximum memory if usingdevice_map. Will default to the maximum memory available for each GPU and the available CPU RAM if unset.
-  tp_plan (Optional[Union[dict, str]], optional) — A torch tensor parallel plan, see here. Usetp_plan="auto"to use the predefined plan based on the model. If it’s a dict, then it should match between module names and desired layout. Note that if you use it, you should launch your script accordingly withtorchrun [args] script.py. This will be much faster than using adevice_map, but has limitations.
-  tp_size (str, optional) — A torch tensor parallel degree. If not provided would default to world size.
-  device_mesh (torch.distributed.DeviceMesh, optional) — A torch device mesh. If not provided would default to world size. Used only for tensor parallel for now. If provided, it has to contain dimension named"tp"in case it’s > 1 dimensional, this dimension will be used for tensor parallelism
-  offload_folder (stroros.PathLike, optional) — If thedevice_mapcontains any value"disk", the folder where we will offload weights.
-  offload_buffers (bool, optional) — Whether or not to offload the buffers with the model parameters.
-  quantization_config (Union[QuantizationConfigMixin,Dict], optional) — A dictionary of configuration parameters or a QuantizationConfigMixin object for quantization (e.g bitsandbytes, gptq).
-  subfolder (str, optional, defaults to"") — In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here.
-  variant (str, optional) — If specified load weights fromvariantfilename, e.g. pytorch_model..bin. 
-  use_safetensors (bool, optional, defaults toNone) — Whether or not to usesafetensorscheckpoints. Defaults toNone. If not specified andsafetensorsis not installed, it will be set toFalse.
-  weights_only (bool, optional, defaults toTrue) — Indicates whether unpickler should be restricted to loading only tensors, primitive types, dictionaries and any types added via torch.serialization.add_safe_globals(). When set to False, we can load wrapper tensor subclass weights.
- key_mapping (`dict[str, str], optional) — A potential mapping of the weight names if using a model on the Hub which is compatible to a Transformers architecture, but was not converted accordingly.
-  kwargs (remaining dictionary of keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done)
- If a configuration is not provided, kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
 
- If a configuration is provided with 
Instantiate a pretrained pytorch model from a pre-trained model configuration.
The model is set in evaluation mode by default using model.eval() (Dropout modules are deactivated). To train
the model, you should first set it back in training mode with model.train().
The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning task.
The warning Weights from XXX not used in YYY means that the layer XXX is not used by YYY, therefore those weights are discarded.
Activate the special “offline-mode” to use this method in a firewalled environment.
Examples:
>>> from transformers import BertConfig, BertModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = BertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == Trueget_compiled_call
< source >( compile_config: typing.Optional[transformers.generation.configuration_utils.CompileConfig] )
Return a torch.compile‘d version of self.__call__. This is useful to dynamically choose between
non-compiled/compiled forward during inference, especially to switch between prefill (where we don’t
want to use compiled version to avoid recomputing the graph with new shapes) and iterative decoding
(where we want the speed-ups of compiled version with static shapes).
Best-effort lookup of the decoder module.
Order of attempts (covers ~85 % of current usages):
- self.decoder
- self.model(many wrappers store the decoder here)
- self.model.get_decoder()(nested wrappers)
- fallback: raise for the few exotic models that need a bespoke rule
get_memory_footprint
< source >( return_buffers = True )
Parameters
-  return_buffers (bool, optional, defaults toTrue) — Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers are tensors that do not require gradients and not registered as parameters. E.g. mean and std in batch norm layers. Please see: https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2
Get the memory footprint of a model. This will return the memory footprint of the current model in bytes. Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2
Return the parameter or buffer given by target if it exists, otherwise throw an error. This combines
get_parameter() and get_buffer() in a single handy function. If the target is an _extra_state attribute,
it will return the extra state provided by the module. Note that it only work if target is a leaf of the model.
Deactivates gradient checkpointing for the current model.
gradient_checkpointing_enable
< source >( gradient_checkpointing_kwargs = None )
Activates gradient checkpointing for the current model.
We pass the __call__ method of the modules instead of forward because __call__ attaches all the hooks of
the module. https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2
Maybe initializes weights. If using a custom PreTrainedModel, you need to implement any
initialization logic in _init_weights.
This is equivalent to calling self.apply(self._initialize_weights), but correctly handles composite models.
This function dynamically dispatches the correct init_weights function to the modules as we advance in the
module graph along the recursion. It can handle an arbitrary number of sub-models. Without it, every composite
model would have to recurse a second time on all sub-models explicitly in the outer-most _init_weights, which
is extremely error prone and inefficient.
Note that the torch.no_grad() decorator is very important as well, as most of our _init_weights do not use
torch.nn.init functions (which are all nograd by default), but simply do in-place ops such as
`module.weight.data.zero()`.
A method executed at the end of each Transformer model initialization, to execute code that needs the model’s modules properly initialized (such as weight initialization).
This is also used when the user is running distributed code. We add hooks to the modules here, according to the model’s tp_plan!
register_for_auto_class
< source >( auto_class = 'AutoModel' )
Register this class with a given auto class. This should only be used for custom models as the ones in the library are already mapped with an auto class.
resize_token_embeddings
< source >( new_num_tokens: typing.Optional[int] = None pad_to_multiple_of: typing.Optional[int] = None mean_resizing: bool = True  ) → torch.nn.Embedding
Parameters
-  new_num_tokens (int, optional) — The new number of tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided orNone, just returns a pointer to the input tokenstorch.nn.Embeddingmodule of the model without doing anything.
-  pad_to_multiple_of (int, optional) — If set will pad the embedding matrix to a multiple of the provided value.Ifnew_num_tokensis set toNonewill just pad the embedding to a multiple ofpad_to_multiple_of.This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5(Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
-  mean_resizing (bool) — Whether to initialize the added embeddings from a multivariate normal distribution that has old embeddings’ mean and covariance or to initialize them with a normal distribution that has a mean of zero and std equalsconfig.initializer_range.Setting mean_resizingtoTrueis useful when increasing the size of the embeddings of causal language models, where the generated tokens’ probabilities won’t be affected by the added embeddings because initializing the new embeddings with the old embeddings’ mean will reduce the kl-divergence between the next token probability before and after adding the new embeddings. Refer to this article for more information: https://nlp.stanford.edu/~johnhew/vocab-expansion.html
Returns
torch.nn.Embedding
Pointer to the input tokens Embeddings Module of the model.
Resizes input token embeddings matrix of the model if new_num_tokens != config.vocab_size.
Takes care of tying weights embeddings afterwards if the model class has a tie_weights() method.
save_pretrained
< source >( save_directory: typing.Union[str, os.PathLike] is_main_process: bool = True state_dict: typing.Optional[dict] = None save_function: Callable = <function save at 0x7f857711dd80> push_to_hub: bool = False max_shard_size: typing.Union[int, str] = '5GB' safe_serialization: bool = True variant: typing.Optional[str] = None token: typing.Union[str, bool, NoneType] = None save_peft_format: bool = True **kwargs )
Parameters
-  save_directory (stroros.PathLike) — Directory to which to save. Will be created if it doesn’t exist.
-  is_main_process (bool, optional, defaults toTrue) — Whether the process calling this is the main process or not. Useful when in distributed training like TPUs and need to call this function on all processes. In this case, setis_main_process=Trueonly on the main process to avoid race conditions.
-  state_dict (nested dictionary of torch.Tensor) — The state dictionary of the model to save. Will default toself.state_dict(), but can be used to only save parts of the model or if special precautions need to be taken when recovering the state dictionary of a model (like when using model parallelism).
-  save_function (Callable) — The function to use to save the state dictionary. Useful on distributed training like TPUs when one need to replacetorch.saveby another method.
-  push_to_hub (bool, optional, defaults toFalse) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id(will default to the name ofsave_directoryin your namespace).
-  max_shard_size (intorstr, optional, defaults to"5GB") — The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"). We default it to 5GB in order for models to be able to run easily on free-tier google colab instances without CPU OOM issues.If a single weight of the model is bigger than max_shard_size, it will be in its own checkpoint shard which will be bigger thanmax_shard_size.
-  safe_serialization (bool, optional, defaults toTrue) — Whether to save the model usingsafetensorsor the traditional PyTorch way (that usespickle).
-  variant (str, optional) — If specified, weights are saved in the format pytorch_model..bin. 
-  token (strorbool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, or not specified, will use the token generated when runninghf auth login(stored in~/.huggingface).
-  save_peft_format (bool, optional, defaults toTrue) — For backward compatibility with PEFT library, in case adapter weights are attached to the model, all keys of the state dict of adapters needs to be prepended withbase_model.model. Advanced users can disable this behaviours by settingsave_peft_formattoFalse.
-  kwargs (dict[str, Any], optional) — Additional key word arguments passed along to the push_to_hub() method.
Save a model and its configuration file to a directory, so that it can be re-loaded using the from_pretrained() class method.
Set the requested attn_implementation for this model.
Symmetric setter. Mirrors the lookup logic used in get_decoder.
If set in the config, tie the weights between the input embeddings and the output embeddings, and the encoder and decoder.
Recursively (for all submodels) tie all the weights of the model.
upcast_modules_in_fp32
< source >( hf_quantizer: transformers.quantizers.base.HfQuantizer | None dtype: dtype )
Upcast modules defined in _keep_in_fp32_modules and _keep_in_fp32_modules_strict in fp32, if
dtype is different than fp32.
Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.
사용자 정의 모델은 초고속 초기화(superfast init)가 특정 모델에 적용될 수 있는지 여부를 결정하는 _supports_assign_param_buffer도 포함해야 합니다.
test_save_and_load_from_pretrained 실패 시, 모델이 _supports_assign_param_buffer를 필요로 하는지 확인하세요.
필요로 한다면 False로 설정하세요.
ModuleUtilsMixin
A few utilities for torch.nn.Modules, to be used as a mixin.
Add a memory hook before and after each sub-module forward pass to record increase in memory consumption.
Increase in memory consumption is stored in a mem_rss_diff attribute for each module and can be reset to zero
with model.reset_memory_hooks_state().
estimate_tokens
< source >( input_dict: dict  ) → int
Helper function to estimate the total number of tokens from the model inputs.
floating_point_ops
< source >( input_dict: dict exclude_embeddings: bool = True  ) → int
Parameters
-  batch_size (int) — The batch size for the forward pass.
-  sequence_length (int) — The number of tokens in each line of the batch.
-  exclude_embeddings (bool, optional, defaults toTrue) — Whether or not to count embedding and softmax operations.
Returns
int
The number of floating-point operations.
Get number of (optionally, non-embeddings) floating-point operations for the forward and backward passes of a
batch with this transformer model. Default approximation neglects the quadratic dependency on the number of
tokens (valid if 12 * d_model << sequence_length) as laid out in this
paper section 2.1. Should be overridden for transformers with parameter
re-use e.g. Albert or Universal Transformers, or if doing long-range modeling with very high sequence lengths.
get_extended_attention_mask
< source >( attention_mask: Tensor input_shape: tuple device: typing.Optional[torch.device] = None dtype: typing.Optional[torch.dtype] = None )
Makes broadcastable attention and causal masks so that future and masked tokens are ignored.
invert_attention_mask
< source >( encoder_attention_mask: Tensor  ) → torch.Tensor
Invert an attention mask (e.g., switches 0. and 1.).
num_parameters
< source >( only_trainable: bool = False exclude_embeddings: bool = False  ) → int
Get number of (optionally, trainable or non-embeddings) parameters in the module.
Reset the mem_rss_diff attribute of each module (see add_memory_hooks()).
허브에 저장하기
A Mixin containing the functionality to push a model or tokenizer to the hub.
push_to_hub
< source >( repo_id: str use_temp_dir: bool | None = None commit_message: str | None = None private: bool | None = None token: bool | str | None = None max_shard_size: int | str | None = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str | None = None commit_description: str | None = None tags: list[str] | None = None **deprecated_kwargs )
Parameters
-  repo_id (str) — The name of the repository you want to push your {object} to. It should contain your organization name when pushing to a given organization.
-  use_temp_dir (bool, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default toTrueif there is no directory named likerepo_id,Falseotherwise.
-  commit_message (str, optional) — Message to commit while pushing. Will default to"Upload {object}".
-  private (bool, optional) — Whether to make the repo private. IfNone(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists.
-  token (boolorstr, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, will use the token generated when runninghf auth login(stored in~/.huggingface). Will default toTrueifrepo_urlis not specified.
-  max_shard_size (intorstr, optional, defaults to"5GB") — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"). We default it to"5GB"so that users can easily load models on free-tier Google Colab instances without any CPU OOM issues.
-  create_pr (bool, optional, defaults toFalse) — Whether or not to create a PR with the uploaded files or directly commit.
-  safe_serialization (bool, optional, defaults toTrue) — Whether or not to convert the model weights in safetensors format for safer serialization.
-  revision (str, optional) — Branch to push the uploaded files to.
-  commit_description (str, optional) — The description of the commit that will be created
-  tags (list[str], optional) — List of tags to push on the Hub.
Upload the {object_files} to the 🤗 Model Hub.
Examples:
from transformers import {object_class}
{object} = {object_class}.from_pretrained("google-bert/bert-base-cased")
# Push the {object} to your namespace with the name "my-finetuned-bert".
{object}.push_to_hub("my-finetuned-bert")
# Push the {object} to an organization with the name "my-finetuned-bert".
{object}.push_to_hub("huggingface/my-finetuned-bert")