modeling¶
Modeling classes for XLNet model.
-
class
XLNetModel
(vocab_size, mem_len=None, reuse_len=None, d_model=768, same_length=False, attn_type='bi', bi_data=False, clamp_len=- 1, n_layer=12, dropout=0.1, classifier_dropout=0.1, n_head=12, d_head=64, layer_norm_eps=1e-12, d_inner=3072, ff_activation='gelu', initializer_range=0.02)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
The bare XLNet Model transformer outputting raw hidden-states without any specific head on top.
This model inherits from
PretrainedModel
. Check the superclass documentation for the generic methods and the library implements for all its model.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- Parameters
vocab_size (
int
) – Vocabulary size of the XLNet model. Defines the number of different tokens that can be represented by theinputs_ids
passed when calling XLNetModel.mem_len (
int
orNone
, optional) – The number of tokens to cache. The key/value pairs that have already been pre-computed in a previous forward pass won’t be re-computed. Defaults toNone
.reuse_len (
int
orNone
, optional) – The number of tokens in the current batch to be cached and reused in the future. Defaults toNone
.d_model (
int
, optional) – Dimensionality of the encoder layers and the pooler layer. Defaults to768
.same_length (
bool
, optional) – Whether or not to use the same attention length for each token. Defaults toFalse
.attn_type (
str
, optional) – The attention type used by the model. Set"bi"
for XLNet,"uni"
for Transformer-XL. Defaults to"bi"
.bi_data (
bool
, optional) – Whether or not to use bidirectional input pipeline. Usually set toTrue
during pretraining andFalse
during fine-tuning. Defaults toFalse
.clamp_len (
int
, optional) – Clamp all relative distances larger than clamp_len. Setting this attribute to -1 means no clamping. Defaults to-1
.n_layer (
int
, optional) – Number of hidden layers in the Transformer encoder. Defaults to12
.dropout (
float
, optional) – The dropout probability for all fully connected layers in the embeddings and encoder. Defaults to0.1
.classifier_dropout (
float
, optional) – The dropout probability for all fully connected layers in the pooler. Defaults to0.1
.n_head (
int
, optional) – Number of attention heads for each attention layer in the Transformer encoder. Defaults to12
.d_head (
int
, optional) – Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder. Defaults to64
.layer_norm_eps (
float
, optional) – The epsilon used by the layer normalization layers. Defaults to1e-12
.d_inner (
int
, optional) – Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder. Defaults to3072
.ff_activation (
str
, optional) – The non-linear activation function in the feed-forward layer."gelu"
,"relu"
,"silu"
and"gelu_new"
are supported. Defaults to"gelu"
.initializer_range (
float
, optional) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices. Defaults to0.02
.
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetModel forward method, overrides the __call__() special method.
- Parameters
input_ids (
Tensor
) – Indices of input sequence tokens in the vocabulary. It’s data type should be int64 and it has a shape of [batch_size, sequence_length].token_type_ids (
Tensor
, optional) –Segment token indices to indicate first and second portions of the inputs. Indices can either be 0 or 1:
0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
It’s data type should be
int64
and it has a shape of [batch_size, sequence_length]. Defaults toNone
, which means we don’t add segment embeddings.attention_mask (
Tensor
, optional) –Mask to avoid performing attention on padding token indices with values being either 0 or 1:
1 for tokens that are not masked,
0 for tokens that are masked.
It’s data type should be
float32
and it has a shape of [batch_size, sequence_length]. Defaults toNone
.mems (
List[Tensor]
, optional) – Contains pre-computed hidden-states. Can be used to speed up sequential decoding. It’s a list (has a length of n_layers) of Tensors (has a data type offloat32
).use_mems
has to be set toTrue
to make use ofmems
. Defaults toNone
, and we don’t use mems.perm_mask (
Tensor
, optional) –Mask to indicate the attention pattern for each input token with values being either 0 or 1.
if
perm_mask[k, i, j] = 0
, i attend to j in batch k;if
perm_mask[k, i, j] = 1
, i does not attend to j in batch k.
Only used during pretraining (to define factorization order) or for sequential decoding (generation). It’s data type should be
float32
and it has a shape of [batch_size, sequence_length, sequence_length]. Defaults toNone
, and each token attends to all the others (full bidirectional attention).target_mapping (
Tensor
, optional) – Mask to indicate the output tokens to use with values being either 0 or 1. Iftarget_mapping[k, i, j] = 1
, the i-th predict in batch k is on the j-th token. Only used during pretraining for partial prediction or for sequential decoding (generation). It’s data type should befloat32
and it has a shape of [batch_size, num_predict, sequence_length]. Defaults toNone
.input_mask (
Tensor
, optional) –Mask to avoid performing attention on padding token indices. Negative of
attention_mask
, i.e. with 0 for real tokens and 1 for padding. Mask values can either be 0 or 1:1 for tokens that are masked,
0 for tokens that are not masked.
You can only uses one of
input_mask
andattention_mask
. It’s data type should befloat32
and it has a shape of [batch_size, sequence_length]. Defaults toNone
.head_mask (
Tensor
, optional) –Mask to nullify selected heads of the self-attention modules. Mask values can either be 0 or 1:
1 indicates the head is not masked,
0 indicates the head is masked.
It’s data type should be
float32
and has a shape of [num_heads] or [num_layers, num_heads]. Defaults toNone
, which means we keep all heads.inputs_embeds (
Tensor
, optional) – An embedded representation tensor which is an alternative ofinput_ids
. You should only specify one of them to avoid contradiction. It’s data type should befloat32
and has a shape of [batch_size, sequence_length, hidden_size]. Defaults toNone
, which means we only specifyinput_ids
.use_mems_train (
bool
, optional) – Whether or not to use recurrent memory mechanism during training. Defaults toFalse
and we don’t use recurrent memory mechanism in training mode.use_mems_eval (
bool
, optional) – Whether or not to use recurrent memory mechanism during evaluation. Defaults toFalse
and we don’t use recurrent memory mechanism in evaluation mode.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Defaults toFalse
and we don’t return the attentions tensors.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Defaults toFalse
and we don’t return the hidden states.return_dict (
bool
, optional) – Whether or not to format the output as adict
. Defaults toFalse
, and the default output is atuple
.
- Returns
A tuple of shape (
output
,new_mems
,hidden_states
,attentions
) or a dict of shape {“last_hidden_state”:output
, “mems”:new_mems
, “hidden_states”:hidden_states
, “attentions”:attentions
}.With the fields:
- output (
Tensor
): Sequence of hidden-states at the last layer of the model. It’s data type should be float32 and has a shape of [batch_size, num_predict, hidden_size].
num_predict
corresponds totarget_mapping.shape[1]
. Iftarget_mapping
isNone
, thennum_predict
corresponds tosequence_length
.
- output (
- mems (
List[Tensor]
): A Tensor list of length ‘n_layers’ containing pre-computed hidden-states.
- mems (
- hidden_states (
List[Tensor]
, optional): A Tensor list containing hidden-states of the model at the output of each layer plus the initial embedding outputs. Each Tensor has a data type of
float32
and has a shape of [batch_size, sequence_length, hidden_size].
- hidden_states (
- attentions (
List[Tensor]
, optional): A Tensor list containing attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. Each Tensor (one for each layer) has a data type of
float32
and has a shape of [batch_size, num_heads, sequence_length, sequence_length].
- attentions (
- Return type
A
tuple
or adict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetModel from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetModel.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor(v) for (k, v) in inputs.items()} outputs = model(**inputs) last_hidden_states = outputs[0]
-
class
XLNetPretrainedModel
(name_scope=None, dtype='float32')[source]¶ Bases:
paddlenlp.transformers.model_utils.PretrainedModel
An abstract class for pretrained XLNet models. It provides XLNet related
model_config_file
,resource_files_names
,pretrained_resource_files_map
,pretrained_init_configuration
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.-
base_model_class
¶
-
-
class
XLNetForSequenceClassification
(xlnet, num_classes=2)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
XLNet Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
- Parameters
xlnet (
XLNetModel
) – An instance ofXLNetModel
.num_classes (
int
, optional) – The number of classes. Defaults to2
.
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetForSequenceClassification forward method, overrides the __call__() special method.
- Parameters
input_ids (
Tensor
) – SeeXLNetModel
.token_type_ids (
Tensor
, optional) – SeeXLNetModel
.attention_mask (
Tensor
, optional) – SeeXLNetModel
.mems (
Tensor
, optional) – SeeXLNetModel
.perm_mask (
Tensor
, optional) – SeeXLNetModel
.target_mapping (
Tensor
, optional) – SeeXLNetModel
.input_mask (
Tensor
, optional) – SeeXLNetModel
.head_mask (
Tensor
, optional) – SeeXLNetModel
.inputs_embeds (
Tensor
, optional) – SeeXLNetModel
.use_mems_train (
bool
, optional) – SeeXLNetModel
.use_mems_eval (
bool
, optional) – SeeXLNetModel
.output_attentions (
bool
, optional) – SeeXLNetModel
.output_hidden_states (
bool
, optional) – SeeXLNetModel
.return_dict (
bool
, optional) – SeeXLNetModel
.
- Returns
A tuple of shape (
output
,new_mems
,hidden_states
,attentions
) or a dict of shape {“last_hidden_state”:output
, “mems”:new_mems
, “hidden_states”:hidden_states
, “attentions”:attentions
}.With the fields:
- output (
Tensor
): Classification scores before SoftMax (also called logits). It’s data type should be float32 and has a shape of [batch_size, num_classes].
- mems (
List[Tensor]
): See
XLNetModel
.- hidden_states (
List[Tensor]
, optional): See
XLNetModel
.- attentions (
List[Tensor]
, optional): See
XLNetModel
.
- output (
- Return type
A
tuple
or adict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetForSequenceClassification from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetForSequenceClassification.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor(v) for (k, v) in inputs.items()} outputs = model(**inputs) logits = outputs[0]
-
class
XLNetForTokenClassification
(xlnet, num_classes=2)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
XLNet Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
- Parameters
xlnet (
XLNetModel
) – An instance ofXLNetModel
.num_classes (
int
, optional) – The number of classes. Defaults to2
.
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetForTokenClassification forward method, overrides the __call__() special method.
- Parameters
input_ids (
Tensor
) – SeeXLNetModel
.token_type_ids (
Tensor
, optional) – SeeXLNetModel
.attention_mask (
Tensor
, optional) – SeeXLNetModel
.mems (
Tensor
, optional) – SeeXLNetModel
.perm_mask (
Tensor
, optional) – SeeXLNetModel
.target_mapping (
Tensor
, optional) – SeeXLNetModel
.input_mask (
Tensor
, optional) – SeeXLNetModel
.head_mask (
Tensor
, optional) – SeeXLNetModel
.inputs_embeds (
Tensor
, optional) – SeeXLNetModel
.use_mems_train (
bool
, optional) – SeeXLNetModel
.use_mems_eval (
bool
, optional) – SeeXLNetModel
.output_attentions (
bool
, optional) – SeeXLNetModel
.output_hidden_states (
bool
, optional) – SeeXLNetModel
.return_dict (
bool
, optional) – SeeXLNetModel
.
- Returns
A tuple of shape (
output
,new_mems
,hidden_states
,attentions
) or a dict of shape {“last_hidden_state”:output
, “mems”:new_mems
, “hidden_states”:hidden_states
, “attentions”:attentions
}.With the fields:
- output (
Tensor
): Classification scores before SoftMax (also called logits). It’s data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].
- output (
- mems (
List[Tensor]
): See
XLNetModel
.
- mems (
- hidden_states (
List[Tensor]
, optional): See
XLNetModel
.
- hidden_states (
- attentions (
List[Tensor]
, optional): See
XLNetModel
.
- attentions (
- Return type
A
tuple
or adict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetForTokenClassification from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetForTokenClassification.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor(v) for (k, v) in inputs.items()} outputs = model(**inputs) logits = outputs[0]