generation_utils¶
-
class
GenerationMixin
[source]¶ Bases:
object
This class implements the interface for generation task.
It’s used as the base class of paddlenlp.transformers.PretrainedModel.
-
generate
(input_ids=None, max_length=20, min_length=0, decode_strategy='greedy_search', temperature=1.0, top_k=0, top_p=1.0, num_beams=1, length_penalty=1.0, early_stopping=False, bos_token_id=None, eos_token_id=None, pad_token_id=None, num_return_sequences=1, use_cache=True, **model_kwargs)[source]¶ The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: “greedy_search”, “sampling” and “beam_search”.
- Parameters
input_ids (Tensor, optional) – The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value
bos_token_id
.max_length (int, optional) – The maximum length of the sequence to be generated. Default to 20.
min_length (int, optional) – The minimum length of the sequence to be generated. Default to 0.
decode_strategy (str, optional) – The decoding strategy in generation. Currently, there are three decoding strategies supported: “greedy_search”, “sampling” and “beam_search”. Default to “greedy_search”.
temperature (float, optional) – The value used to module the next token probabilities in the “sampling” strategy. Default to 1.0, which means no effect.
top_k (int, optional) – The number of highest probability tokens to keep for top-k-filtering in the “sampling” strategy. Default to 0, which means no effect.
top_p (float, optional) – The cumulative probability for top-p-filtering in the “sampling” strategy. The value should satisfy \(0 <= top\_p < 1\). Default to 1.0, which means no effect.
num_beams (int, optional) – The number of beams in the “beam_search” strategy. Default to 1.
length_penalty (float, optional) – The exponential penalty to the sequence length in the “beam_search” strategy. If \(length\_penalty < 1.0\), the model will generate shorter sequences. If \(length\_penalty > 1.0\), the model will generate longer sequences. Default to 1.0, which means no penalty.
early_stopping (bool, optional) – Whether to stop searching in the “beam_search” strategy when at least
num_beams
sentences are finished per batch or not. Default to False.bos_token_id (int, optional) – The id of the
bos_token
. Default to None.eos_token_id (int, optional) – The id of the
eos_token
. Default to None.pad_token_id (int, optional) – The id of the
pad_token
. Default to None.num_return_sequences (int, optional) – The number of returned sequences for each sequence in the batch. Default to 1.
use_cache – (bool, optional): Whether or not use the model cache to speed up decoding. Default to True.
model_kwargs (dict) – It can be used to specify additional kwargs passed to the model.
- Returns
- It is a tuple includes generated sequence ids and
scores. The generated sequence ids is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input
input_ids
. The scores is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, the same as the parameters in the model.
tuple: It is a tuple contains two elements: ids and scores. Each element is a Tensor.
With the fields:
- ids (Tensor): The ids of the generated sequences. It is a Tensor
with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input
input_ids
.
- scores (Tensor):The scores of the generated sequences. It is a
Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.
- Return type
tuple (Tensor)
Example
import paddle from paddlenlp.transformers import ( UnifiedTransformerLMHeadModel, UnifiedTransformerTokenizer ) paddle.seed(2) model_name_or_path = 'unified_transformer-12L-cn-luge' model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path) tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path) history = "早上好,今天空气质量不错。" inputs = tokenizer.dialogue_encode(history, task_type='chitchat', add_start_token_as_response=True, return_tensors=True) # Generate the sequence by using "greedy_search" strategy ids, scores = model.generate( input_ids=inputs['input_ids'], token_type_ids=inputs['token_type_ids'], position_ids=inputs['position_ids'], attention_mask=inputs['attention_mask'], decode_strategy="greedy_search") print(ids.shape, scores.shape) # [1, 3] [1, 1] sequence_ids = ids.numpy().tolist()[0] sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)] response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False) print(response) # 是的 # Generate 2 sequences by using "sampling" strategy (top_k=5) ids, scores = model.generate( input_ids=inputs['input_ids'], token_type_ids=inputs['token_type_ids'], position_ids=inputs['position_ids'], attention_mask=inputs['attention_mask'], decode_strategy="sampling", top_k=5, num_return_sequences=2) print(ids.shape, scores.shape) # [2, 7] [2, 1] response = [] for sequence_ids in ids.numpy().tolist(): sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)] text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False) response.append(text) print(response) # ['天气好,心情也好', '你也是'] # Generate 2 sequences by using "beam_search" strategy (num_beams=5) ids, scores = model.generate( input_ids=inputs['input_ids'], token_type_ids=inputs['token_type_ids'], position_ids=inputs['position_ids'], attention_mask=inputs['attention_mask'], decode_strategy="beam_search", num_beams=5, num_return_sequences=2) print(ids.shape, scores.shape) # [2, 3] [2, 1] response = [] for sequence_ids in ids.numpy().tolist(): sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)] text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False) response.append(text) print(response) # ['是的', '嗯嗯']
-