Feature Extractor¶
A feature extractor is in charge of preparing input features for a multi-modal model. This includes feature extraction from sequences, e.g., pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images e.g. cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow tensors.
FeatureExtractionMixin¶
-
class
transformers.feature_extraction_utils.FeatureExtractionMixin(**kwargs)[source]¶ This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.
-
classmethod
from_pretrained(pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) → SequenceFeatureExtractor[source]¶ Instantiate a type of
FeatureExtractionMixinfrom a feature extractor, e.g. a derived class ofSequenceFeatureExtractor.- Parameters
pretrained_model_name_or_path (
stroros.PathLike) –This can be either:
a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like
bert-base-uncased, or namespaced under a user or organization name, likedbmdz/bert-base-german-cased.a path to a directory containing a feature extractor file saved using the
save_pretrained()method, e.g.,./my_model_directory/.a path or url to a saved feature extractor JSON file, e.g.,
./my_model_directory/preprocessor_config.json.
cache_dir (
stroros.PathLike, optional) – Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.force_download (
bool, optional, defaults toFalse) – Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.resume_download (
bool, optional, defaults toFalse) – Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.proxies (
Dict[str, str], optional) – A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.The proxies are used on each request.use_auth_token (
stror bool, optional) – The token to use as HTTP bearer authorization for remote files. IfTrue, will use the token generated when runningtransformers-cli login(stored inhuggingface).revision (
str, optional, defaults to"main") – The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git.return_unused_kwargs (
bool, optional, defaults toFalse) – IfFalse, then this function returns just the final feature extractor object. IfTrue, then this functions returns aTuple(feature_extractor, unused_kwargs)where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part ofkwargswhich has not been used to updatefeature_extractorand is otherwise ignored.kwargs (
Dict[str, Any], optional) – The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by thereturn_unused_kwargskeyword parameter.
Note
Passing
use_auth_token=Trueis required when you want to use a private model.- Returns
A feature extractor of type
FeatureExtractionMixin.
Examples:
# We can't instantiate directly the base class `FeatureExtractionMixin` nor `SequenceFeatureExtractor` so let's show the examples on a # derived class: `Wav2Vec2FeatureExtractor` feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h') # Download feature_extraction_config from huggingface.co and cache. feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('./test/saved_model/') # E.g. feature_extractor (or model) was saved using `save_pretrained('./test/saved_model/')` feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('./test/saved_model/preprocessor_config.json') feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h', return_attention_mask=False, foo=False) assert feature_extractor.return_attention_mask is False feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h', return_attention_mask=False, foo=False, return_unused_kwargs=True) assert feature_extractor.return_attention_mask is False assert unused_kwargs == {'foo': False}
-
save_pretrained(save_directory: Union[str, os.PathLike])[source]¶ Save a feature_extractor object to the directory
save_directory, so that it can be re-loaded using thefrom_pretrained()class method.- Parameters
save_directory (
stroros.PathLike) – Directory where the feature extractor JSON file will be saved (will be created if it does not exist).
-
classmethod
SequenceFeatureExtractor¶
-
class
transformers.SequenceFeatureExtractor(feature_size: int, sampling_rate: int, padding_value: float, **kwargs)[source]¶ This is a general feature extraction class for speech recognition.
- Parameters
feature_size (
int) – The feature dimension of the extracted features.sampling_rate (
int) – The sampling rate at which the audio files should be digitalized expressed in Hertz per second (Hz).padding_value (
float) – The value that is used to fill the padding values / vectors.
-
pad(processed_features: Union[transformers.feature_extraction_utils.BatchFeature, List[transformers.feature_extraction_utils.BatchFeature], Dict[str, transformers.feature_extraction_utils.BatchFeature], Dict[str, List[transformers.feature_extraction_utils.BatchFeature]], List[Dict[str, transformers.feature_extraction_utils.BatchFeature]]], padding: Union[bool, str, transformers.file_utils.PaddingStrategy] = True, max_length: Optional[int] = None, pad_to_multiple_of: Optional[int] = None, return_attention_mask: Optional[bool] = None, return_tensors: Optional[Union[str, transformers.file_utils.TensorType]] = None) → transformers.feature_extraction_utils.BatchFeature[source]¶ Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the max sequence length in the batch.
Padding side (left/right) padding values are defined at the feature extractor level (with
self.padding_side,self.padding_value)Note
If the
processed_featurespassed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the result will use the same type unless you provide a different tensor type withreturn_tensors. In the case of PyTorch tensors, you will lose the specific device of your tensors however.- Parameters
processed_features (
BatchFeature, list ofBatchFeature,Dict[str, List[float]],Dict[str, List[List[float]]orList[Dict[str, List[float]]]) –Processed inputs. Can represent one input (
BatchFeatureorDict[str, List[float]]) or a batch of input values / vectors (list ofBatchFeature, Dict[str, List[List[float]]] or List[Dict[str, List[float]]]) so you can use this method during preprocessing as well as in a PyTorch Dataloader collate function.Instead of
List[float]you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors), see the note above for the return type.padding (
bool,strorPaddingStrategy, optional, defaults toTrue) –Select a strategy to pad the returned sequences (according to the model’s padding side and padding index) among:
Trueor'longest': Pad to the longest sequence in the batch (or no padding if only a single sequence if provided).'max_length': Pad to a maximum length specified with the argumentmax_lengthor to the maximum acceptable input length for the model if that argument is not provided.Falseor'do_not_pad'(default): No padding (i.e., can output a batch with sequences of different lengths).
max_length (
int, optional) – Maximum length of the returned list and optionally padding length (see above).pad_to_multiple_of (
int, optional) –If set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
return_attention_mask (
bool, optional) –Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractor’s default.
return_tensors (
strorTensorType, optional) –If set, will return tensors instead of list of python integers. Acceptable values are:
'tf': Return TensorFlowtf.constantobjects.'pt': Return PyTorchtorch.Tensorobjects.'np': Return Numpynp.ndarrayobjects.
BatchFeature¶
-
class
transformers.BatchFeature(data: Optional[Dict[str, Any]] = None, tensor_type: Optional[Union[str, transformers.file_utils.TensorType]] = None)[source]¶ Holds the output of the
pad()and feature extractor specific__call__methods.This class is derived from a python dictionary and can be used as a dictionary.
- Parameters
data (
dict) – Dictionary of lists/arrays/tensors returned by the __call__/pad methods (‘input_values’, ‘attention_mask’, etc.).tensor_type (
Union[None, str, TensorType], optional) – You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.
-
convert_to_tensors(tensor_type: Optional[Union[str, transformers.file_utils.TensorType]] = None)[source]¶ Convert the inner content to tensors.
- Parameters
tensor_type (
strorTensorType, optional) – The type of tensors to use. Ifstr, should be one of the values of the enumTensorType. IfNone, no modification is done.
-
to(device: Union[str, torch.device]) → BatchFeature[source]¶ Send all values to device by calling
v.to(device)(PyTorch only).- Parameters
device (
strortorch.device) – The device to put the tensors on.- Returns
The same instance after modification.
- Return type
ImageFeatureExtractionMixin¶
-
class
transformers.image_utils.ImageFeatureExtractionMixin[source]¶ Mixin that contain utilities for preparing image features.
-
center_crop(image, size)[source]¶ Crops
imageto the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked).- Parameters
image (
PIL.Image.Imageornp.ndarrayortorch.Tensor) – The image to resize.size (
intorTuple[int, int]) – The size to which crop the image.
-
normalize(image, mean, std)[source]¶ Normalizes
imagewithmeanandstd. Note that this will trigger a conversion ofimageto a NumPy array if it’s a PIL Image.- Parameters
image (
PIL.Image.Imageornp.ndarrayortorch.Tensor) – The image to normalize.mean (
List[float]ornp.ndarrayortorch.Tensor) – The mean (per channel) to use for normalization.std (
List[float]ornp.ndarrayortorch.Tensor) – The standard deviation (per channel) to use for normalization.
-
resize(image, size, resample=2)[source]¶ Resizes
image. Note that this will trigger a conversion ofimageto a PIL Image.- Parameters
image (
PIL.Image.Imageornp.ndarrayortorch.Tensor) – The image to resize.size (
intorTuple[int, int]) – The size to use for resizing the image.resample (
int, optional, defaults toPIL.Image.BILINEAR) – The filter to user for resampling.
-
to_numpy_array(image, rescale=None, channel_first=True)[source]¶ Converts
imageto a numpy array. Optionally rescales it and puts the channel dimension as the first dimension.- Parameters
image (
PIL.Image.Imageornp.ndarrayortorch.Tensor) – The image to convert to a NumPy array.rescale (
bool, optional) – Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default toTrueif the image is a PIL Image or an array/tensor of integers,Falseotherwise.channel_first (
bool, optional, defaults toTrue) – Whether or not to permute the dimensions of the image to put the channel dimension first.
-
to_pil_image(image, rescale=None)[source]¶ Converts
imageto a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.- Parameters
image (
PIL.Image.Imageornumpy.ndarrayortorch.Tensor) – The image to convert to the PIL Image format.rescale (
bool, optional) – Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default toTrueif the image type is a floating type,Falseotherwise.
-