modeling¶
Modeling classes for XLNet model.
-
class
XLNetModel(vocab_size, mem_len=None, reuse_len=None, d_model=768, same_length=False, attn_type='bi', bi_data=False, clamp_len=- 1, n_layer=12, dropout=0.1, classifier_dropout=0.1, n_head=12, d_head=64, layer_norm_eps=1e-12, d_inner=3072, ff_activation='gelu', initializer_range=0.02)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModelThe bare XLNet Model transformer outputting raw hidden-states without any specific head on top.
This model inherits from
PretrainedModel. Check the superclass documentation for the generic methods and the library implements for all its model.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- Parameters
vocab_size (
int) – Vocabulary size of the XLNet model. Defines the number of different tokens that can be represented by theinputs_idspassed when calling XLNetModel.mem_len (
intorNone, optional) – The number of tokens to cache. The key/value pairs that have already been pre-computed in a previous forward pass won’t be re-computed. Defaults toNone.reuse_len (
intorNone, optional) – The number of tokens in the current batch to be cached and reused in the future. Defaults toNone.d_model (
int, optional) – Dimensionality of the encoder layers and the pooler layer. Defaults to768.same_length (
bool, optional) – Whether or not to use the same attention length for each token. Defaults toFalse.attn_type (
str, optional) – The attention type used by the model. Set"bi"for XLNet,"uni"for Transformer-XL. Defaults to"bi".bi_data (
bool, optional) – Whether or not to use bidirectional input pipeline. Usually set toTrueduring pretraining andFalseduring fine-tuning. Defaults toFalse.clamp_len (
int, optional) – Clamp all relative distances larger than clamp_len. Setting this attribute to -1 means no clamping. Defaults to-1.n_layer (
int, optional) – Number of hidden layers in the Transformer encoder. Defaults to12.dropout (
float, optional) – The dropout probability for all fully connected layers in the embeddings and encoder. Defaults to0.1.classifier_dropout (
float, optional) – The dropout probability for all fully connected layers in the pooler. Defaults to0.1.n_head (
int, optional) – Number of attention heads for each attention layer in the Transformer encoder. Defaults to12.d_head (
int, optional) – Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder. Defaults to64.layer_norm_eps (
float, optional) – The epsilon used by the layer normalization layers. Defaults to1e-12.d_inner (
int, optional) – Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder. Defaults to3072.ff_activation (
str, optional) – The non-linear activation function in the feed-forward layer."gelu","relu","silu"and"gelu_new"are supported. Defaults to"gelu".initializer_range (
float, optional) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices. Defaults to0.02.
-
forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetModel forward method, overrides the __call__() special method.
- Parameters
input_ids (
Tensor) – Indices of input sequence tokens in the vocabulary. It’s data type should be int64 and it has a shape of [batch_size, sequence_length].token_type_ids (
Tensor, optional) –Segment token indices to indicate first and second portions of the inputs. Indices can either be 0 or 1:
0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
It’s data type should be
int64and it has a shape of [batch_size, sequence_length]. Defaults toNone, which means we don’t add segment embeddings.attention_mask (
Tensor, optional) –Mask to avoid performing attention on padding token indices with values being either 0 or 1:
1 for tokens that are not masked,
0 for tokens that are masked.
It’s data type should be
float32and it has a shape of [batch_size, sequence_length]. Defaults toNone.mems (
List[Tensor], optional) – Contains pre-computed hidden-states. Can be used to speed up sequential decoding. It’s a list (has a length of n_layers) of Tensors (has a data type offloat32).use_memshas to be set toTrueto make use ofmems. Defaults toNone, and we don’t use mems.perm_mask (
Tensor, optional) –Mask to indicate the attention pattern for each input token with values being either 0 or 1.
if
perm_mask[k, i, j] = 0, i attend to j in batch k;if
perm_mask[k, i, j] = 1, i does not attend to j in batch k.
Only used during pretraining (to define factorization order) or for sequential decoding (generation). It’s data type should be
float32and it has a shape of [batch_size, sequence_length, sequence_length]. Defaults toNone, and each token attends to all the others (full bidirectional attention).target_mapping (
Tensor, optional) – Mask to indicate the output tokens to use with values being either 0 or 1. Iftarget_mapping[k, i, j] = 1, the i-th predict in batch k is on the j-th token. Only used during pretraining for partial prediction or for sequential decoding (generation). It’s data type should befloat32and it has a shape of [batch_size, num_predict, sequence_length]. Defaults toNone.input_mask (
Tensor, optional) –Mask to avoid performing attention on padding token indices. Negative of
attention_mask, i.e. with 0 for real tokens and 1 for padding. Mask values can either be 0 or 1:1 for tokens that are masked,
0 for tokens that are not masked.
You can only uses one of
input_maskandattention_mask. It’s data type should befloat32and it has a shape of [batch_size, sequence_length]. Defaults toNone.head_mask (
Tensor, optional) –Mask to nullify selected heads of the self-attention modules. Mask values can either be 0 or 1:
1 indicates the head is not masked,
0 indicates the head is masked.
It’s data type should be
float32and has a shape of [num_heads] or [num_layers, num_heads]. Defaults toNone, which means we keep all heads.inputs_embeds (
Tensor, optional) – An embedded representation tensor which is an alternative ofinput_ids. You should only specify one of them to avoid contradiction. It’s data type should befloat32and has a shape of [batch_size, sequence_length, hidden_size]. Defaults toNone, which means we only specifyinput_ids.use_mems_train (
bool, optional) – Whether or not to use recurrent memory mechanism during training. Defaults toFalseand we don’t use recurrent memory mechanism in training mode.use_mems_eval (
bool, optional) – Whether or not to use recurrent memory mechanism during evaluation. Defaults toFalseand we don’t use recurrent memory mechanism in evaluation mode.output_attentions (
bool, optional) – Whether or not to return the attentions tensors of all attention layers. Defaults toFalseand we don’t return the attentions tensors.output_hidden_states (
bool, optional) – Whether or not to return the hidden states of all layers. Defaults toFalseand we don’t return the hidden states.return_dict (
bool, optional) – Whether or not to format the output as adict. Defaults toFalse, and the default output is atuple.
- Returns
A tuple of shape (
output,new_mems,hidden_states,attentions) or a dict of shape {“last_hidden_state”:output, “mems”:new_mems, “hidden_states”:hidden_states, “attentions”:attentions}.With the fields:
- output (
Tensor): Sequence of hidden-states at the last layer of the model. It’s data type should be float32 and has a shape of [batch_size, num_predict, hidden_size].
num_predictcorresponds totarget_mapping.shape[1]. Iftarget_mappingisNone, thennum_predictcorresponds tosequence_length.
- output (
- mems (
List[Tensor]): A Tensor list of length ‘n_layers’ containing pre-computed hidden-states.
- mems (
- hidden_states (
List[Tensor], optional): A Tensor list containing hidden-states of the model at the output of each layer plus the initial embedding outputs. Each Tensor has a data type of
float32and has a shape of [batch_size, sequence_length, hidden_size].
- hidden_states (
- attentions (
List[Tensor], optional): A Tensor list containing attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. Each Tensor (one for each layer) has a data type of
float32and has a shape of [batch_size, num_heads, sequence_length, sequence_length].
- attentions (
- Return type
A
tupleor adict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetModel from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetModel.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor(v) for (k, v) in inputs.items()} outputs = model(**inputs) last_hidden_states = outputs[0]
-
class
XLNetPretrainedModel(name_scope=None, dtype='float32')[source]¶ Bases:
paddlenlp.transformers.model_utils.PretrainedModelAn abstract class for pretrained XLNet models. It provides XLNet related
model_config_file,resource_files_names,pretrained_resource_files_map,pretrained_init_configuration,base_model_prefixfor downloading and loading pretrained models. SeePretrainedModelfor more details.-
base_model_class¶
-
-
class
XLNetForSequenceClassification(xlnet, num_classes=2)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModelXLNet Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
- Parameters
xlnet (
XLNetModel) – An instance ofXLNetModel.num_classes (
int, optional) – The number of classes. Defaults to2.
-
forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetForSequenceClassification forward method, overrides the __call__() special method.
- Parameters
input_ids (
Tensor) – SeeXLNetModel.token_type_ids (
Tensor, optional) – SeeXLNetModel.attention_mask (
Tensor, optional) – SeeXLNetModel.mems (
Tensor, optional) – SeeXLNetModel.perm_mask (
Tensor, optional) – SeeXLNetModel.target_mapping (
Tensor, optional) – SeeXLNetModel.input_mask (
Tensor, optional) – SeeXLNetModel.head_mask (
Tensor, optional) – SeeXLNetModel.inputs_embeds (
Tensor, optional) – SeeXLNetModel.use_mems_train (
bool, optional) – SeeXLNetModel.use_mems_eval (
bool, optional) – SeeXLNetModel.output_attentions (
bool, optional) – SeeXLNetModel.output_hidden_states (
bool, optional) – SeeXLNetModel.return_dict (
bool, optional) – SeeXLNetModel.
- Returns
A tuple of shape (
output,new_mems,hidden_states,attentions) or a dict of shape {“last_hidden_state”:output, “mems”:new_mems, “hidden_states”:hidden_states, “attentions”:attentions}.With the fields:
- output (
Tensor): Classification scores before SoftMax (also called logits). It’s data type should be float32 and has a shape of [batch_size, num_classes].
- mems (
List[Tensor]): See
XLNetModel.- hidden_states (
List[Tensor], optional): See
XLNetModel.- attentions (
List[Tensor], optional): See
XLNetModel.
- output (
- Return type
A
tupleor adict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetForSequenceClassification from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetForSequenceClassification.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor(v) for (k, v) in inputs.items()} outputs = model(**inputs) logits = outputs[0]
-
class
XLNetForTokenClassification(xlnet, num_classes=2)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModelXLNet Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
- Parameters
xlnet (
XLNetModel) – An instance ofXLNetModel.num_classes (
int, optional) – The number of classes. Defaults to2.
-
forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetForTokenClassification forward method, overrides the __call__() special method.
- Parameters
input_ids (
Tensor) – SeeXLNetModel.token_type_ids (
Tensor, optional) – SeeXLNetModel.attention_mask (
Tensor, optional) – SeeXLNetModel.mems (
Tensor, optional) – SeeXLNetModel.perm_mask (
Tensor, optional) – SeeXLNetModel.target_mapping (
Tensor, optional) – SeeXLNetModel.input_mask (
Tensor, optional) – SeeXLNetModel.head_mask (
Tensor, optional) – SeeXLNetModel.inputs_embeds (
Tensor, optional) – SeeXLNetModel.use_mems_train (
bool, optional) – SeeXLNetModel.use_mems_eval (
bool, optional) – SeeXLNetModel.output_attentions (
bool, optional) – SeeXLNetModel.output_hidden_states (
bool, optional) – SeeXLNetModel.return_dict (
bool, optional) – SeeXLNetModel.
- Returns
A tuple of shape (
output,new_mems,hidden_states,attentions) or a dict of shape {“last_hidden_state”:output, “mems”:new_mems, “hidden_states”:hidden_states, “attentions”:attentions}.With the fields:
- output (
Tensor): Classification scores before SoftMax (also called logits). It’s data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].
- output (
- mems (
List[Tensor]): See
XLNetModel.
- mems (
- hidden_states (
List[Tensor], optional): See
XLNetModel.
- hidden_states (
- attentions (
List[Tensor], optional): See
XLNetModel.
- attentions (
- Return type
A
tupleor adict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetForTokenClassification from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetForTokenClassification.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor(v) for (k, v) in inputs.items()} outputs = model(**inputs) logits = outputs[0]