En esta lección hacemos un acercamiento a la creación y uso de un modelo transformer en HuggingFace. Usaremos la clase TFAutoModel en Tensoflow, AutoModel en Torch.
TFAutoModel y clases cercanas son envoltorios (wrappers) de una gran variedad de modelos disponibles en la librería transformers.
Primero inicializaremos un modelo BERT, cargandolo desde su archivo de configuración.
Para correr el cuaderno original de HuggingFace para Tensorflow en Colab vaya a Huggingface notebook. Para Torch aquí.
from transformers import BertConfig, TFBertModel
# Instancia configuración
config = BertConfig()
# Instancia el modelo desde la configuración
model = TFBertModel(config)
# El modelo fue inicializado aleatoriamente por defecto##
print(config)
BertConfig { "attention_probs_dropout_prob": 0.1, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.8.1", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
El modelo está listo para ser entrenado para una tarea específica. Sin embargo esto requiere tiempo dinero posiblemente y una gran cantidad de datos. Por estas razones, lo razonable actualmente es empezar con un modelo preentrenado.
from transformers import TFBertModel
model = TFBertModel.from_pretrained('bert-base-cased')
Aquí el uso de TFBertModel es equivalente a TFAutoModel. Para los detalles del modelo entrenado revise su model card. Este modelo fue inicializado con los pesos del checkpoint de pesos actualmente disponible en la fuente (Hugging Face). Ya se puede usar para inferencia en tareas para las cuales fue entrenado. en este caso predecir palabras o sentencias enmascaradas. Para una tarea diferente se puede hacer un ajuste fino (fine tunning) que reentrena el modelo con muy pocos pasos.
Los pesos fueron cargados y colocados en una carpeta caché. que por defecto es ~/.cache/huggingface/transformers. Se puede modificar eata direccción de cache con la variable ambiente HF_HOME. La lista completa de checkpoints disponibles para BERT se encuentra aquí.
Se usa el método save_pretrained, el cual es análogo al métdos from_pretrained
Se almacena dos archivos en el disco:
El archivo config.json contiene la configuración del modelo preentrenado en formato json. El archivo tf_model.h5 es conocido como diccionario de estado (state dictionary) y contiene todos los pesos del modelo para este checkpoint. El formato h5 corresponde a archivos de tipo hdf5. Los dos archivos van de la mano.
Consideremos el siguiente conjunto de sentencias.
sequences = [
'Hello',
'Cool',
'Nice!'
]
El tokenizer convierte esta secuencias en índices del respectivo vocabulario.
from transformers import AutoTokenizer
checkpoint = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
encoded_secuences = tokenizer(sequences, padding=True, truncation=True, return_tensors='tf') # return ='pt' para tensores pytorch
print(encoded_secuences.input_ids.numpy())
[[ 101 8667 102 0] [ 101 13297 102 0] [ 101 8835 106 102]]
encoded_secuences = encoded_secuences.input_ids.numpy()
Esta es una lista de secuencias codificadas. Si vamos a comenzar nuestro trabajo con estas secuencias codificadas, las convertimos en tensores
import tensorflow as tf
model_inputs = tf.constant(encoded_secuences)
output = model(model_inputs)
print(output)
TFBaseModelOutputWithPooling(last_hidden_state=<tf.Tensor: shape=(3, 4, 768), dtype=float32, numpy= array([[[-0.90261275, 0.46482232, -1.0680807 , ..., -0.699818 , 0.0272803 , 0.8792781 ], [-0.2942962 , 0.43228415, -0.28740913, ..., 0.3589468 , -1.4553845 , -0.7646338 ], [ 0.6877541 , 0.7636266 , -0.20870464, ..., 0.9895973 , -0.38200825, 0.54974 ], [ 0.07113194, 0.6081523 , -0.17413221, ..., 0.5636239 , -0.80637443, 0.35920945]], [[-0.9534947 , 0.51484615, -0.96658885, ..., -0.759168 , 0.02149527, 0.84670496], [ 0.20863365, 1.0134907 , -0.2225053 , ..., 0.81137526, -0.75574476, -0.21008678], [ 0.6778561 , 0.77220726, 0.02414589, ..., 1.0647261 , -0.39467815, 0.5474687 ], [ 0.05586064, 0.6965945 , -0.03238491, ..., 0.5867801 , -0.73757684, 0.2639028 ]], [[-0.3660255 , 0.6517344 , -1.5038463 , ..., -0.7160782 , 0.1564751 , 0.63766223], [ 0.00262259, -0.05316934, -0.38248605, ..., -0.36637494, -0.73271465, -0.23060991], [ 2.1378229 , 0.26028013, 0.1468118 , ..., 0.69814235, -2.1301665 , -0.93348044], [-0.2108147 , 0.9443397 , -0.9976817 , ..., 0.7214987 , 0.40497112, 0.87027246]]], dtype=float32)>, pooler_output=<tf.Tensor: shape=(3, 768), dtype=float32, numpy= array([[ 0.30539322, 0.86028224, 0.6725495 , ..., 0.6405281 , 0.17553997, -0.12844808], [ 0.15537432, 0.8595787 , 0.66877705, ..., 0.6579369 , 0.23296279, -0.21999657], [ 0.17398041, 0.8622835 , 0.6576183 , ..., 0.60278434, 0.21288712, -0.03698413]], dtype=float32)>, hidden_states=None, attentions=None)
help(output)
Help on TFBaseModelOutputWithPooling in module transformers.modeling_tf_outputs object: class TFBaseModelOutputWithPooling(transformers.file_utils.ModelOutput) | TFBaseModelOutputWithPooling(last_hidden_state: tensorflow.python.framework.ops.Tensor = None, pooler_output: tensorflow.python.framework.ops.Tensor = None, hidden_states: Union[Tuple[tensorflow.python.framework.ops.Tensor], NoneType] = None, attentions: Union[Tuple[tensorflow.python.framework.ops.Tensor], NoneType] = None) -> None | | Base class for model's outputs that also contains a pooling of the last hidden states. | | Args: | last_hidden_state (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`): | Sequence of hidden-states at the output of the last layer of the model. | pooler_output (:obj:`tf.Tensor` of shape :obj:`(batch_size, hidden_size)`): | Last layer hidden-state of the first token of the sequence (classification token) further processed by a | Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence | prediction (classification) objective during pretraining. | | This output is usually *not* a good summary of the semantic content of the input, you're often better with | averaging or pooling the sequence of hidden-states for the whole input sequence. | hidden_states (:obj:`tuple(tf.Tensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): | Tuple of :obj:`tf.Tensor` (one for the output of the embeddings + one for the output of each layer) of | shape :obj:`(batch_size, sequence_length, hidden_size)`. | | Hidden-states of the model at the output of each layer plus the initial embedding outputs. | attentions (:obj:`tuple(tf.Tensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): | Tuple of :obj:`tf.Tensor` (one for each layer) of shape :obj:`(batch_size, num_heads, sequence_length, | sequence_length)`. | | Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | heads. | | Method resolution order: | TFBaseModelOutputWithPooling | transformers.file_utils.ModelOutput | collections.OrderedDict | builtins.dict | builtins.object | | Methods defined here: | | __eq__(self, other) | | __init__(self, last_hidden_state: tensorflow.python.framework.ops.Tensor = None, pooler_output: tensorflow.python.framework.ops.Tensor = None, hidden_states: Union[Tuple[tensorflow.python.framework.ops.Tensor], NoneType] = None, attentions: Union[Tuple[tensorflow.python.framework.ops.Tensor], NoneType] = None) -> None | | __repr__(self) | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __annotations__ = {'attentions': typing.Union[typing.Tuple[tensorflow.... | | __dataclass_fields__ = {'attentions': Field(name='attentions',type=typ... | | __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or... | | __hash__ = None | | attentions = None | | hidden_states = None | | last_hidden_state = None | | pooler_output = None | | ---------------------------------------------------------------------- | Methods inherited from transformers.file_utils.ModelOutput: | | __delitem__(self, *args, **kwargs) | Delete self[key]. | | __getitem__(self, k) | x.__getitem__(y) <==> x[y] | | __post_init__(self) | | __setattr__(self, name, value) | Implement setattr(self, name, value). | | __setitem__(self, key, value) | Set self[key] to value. | | pop(self, *args, **kwargs) | od.pop(k[,d]) -> v, remove specified key and return the corresponding | value. If key is not found, d is returned if given, otherwise KeyError | is raised. | | setdefault(self, *args, **kwargs) | Insert key with a value of default if key is not in the dictionary. | | Return the value for key if key is in the dictionary, else default. | | to_tuple(self) -> Tuple[Any] | Convert self to a tuple containing all the attributes/keys that are not ``None``. | | update(self, *args, **kwargs) | D.update([E, ]**F) -> None. Update D from dict/iterable E and F. | If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] | If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v | In either case, this is followed by: for k in F: D[k] = F[k] | | ---------------------------------------------------------------------- | Methods inherited from collections.OrderedDict: | | __ge__(self, value, /) | Return self>=value. | | __gt__(self, value, /) | Return self>value. | | __iter__(self, /) | Implement iter(self). | | __le__(self, value, /) | Return self<=value. | | __lt__(self, value, /) | Return self<value. | | __ne__(self, value, /) | Return self!=value. | | __reduce__(...) | Return state information for pickling | | __reversed__(...) | od.__reversed__() <==> reversed(od) | | __sizeof__(...) | D.__sizeof__() -> size of D in memory, in bytes | | clear(...) | od.clear() -> None. Remove all items from od. | | copy(...) | od.copy() -> a shallow copy of od | | items(...) | D.items() -> a set-like object providing a view on D's items | | keys(...) | D.keys() -> a set-like object providing a view on D's keys | | move_to_end(self, /, key, last=True) | Move an existing element to the end (or beginning if last is false). | | Raise KeyError if the element does not exist. | | popitem(self, /, last=True) | Remove and return a (key, value) pair from the dictionary. | | Pairs are returned in LIFO order if last is true or FIFO order if false. | | values(...) | D.values() -> an object providing a view on D's values | | ---------------------------------------------------------------------- | Class methods inherited from collections.OrderedDict: | | fromkeys(iterable, value=None) from builtins.type | Create a new ordered dictionary with keys from iterable and values set to value. | | ---------------------------------------------------------------------- | Data descriptors inherited from collections.OrderedDict: | | __dict__ | | ---------------------------------------------------------------------- | Methods inherited from builtins.dict: | | __contains__(self, key, /) | True if the dictionary has the specified key, else False. | | __getattribute__(self, name, /) | Return getattr(self, name). | | __len__(self, /) | Return len(self). | | get(self, key, default=None, /) | Return the value for key if key is in the dictionary, else default. | | ---------------------------------------------------------------------- | Static methods inherited from builtins.dict: | | __new__(*args, **kwargs) from builtins.type | Create and return a new object. See help(type) for accurate signature.