token_embedding¶
-
class
TokenEmbedding
(embedding_name='w2v.baidu_encyclopedia.target.word-word.dim300', unknown_token='[UNK]', unknown_token_vector=None, extended_vocab_path=None, trainable=True, keep_extended_vocab_only=False)[source]¶ Bases:
paddle.nn.layer.common.Embedding
A
TokenEmbedding
can load pre-trained embedding model which paddlenlp provides by specifying embedding name. Furthermore, aTokenEmbedding
can load extended vocabulary by specifying extended_vocab_path.- Parameters
(object (keep_extended_vocab_only) –
str
, optional, default tow2v.baidu_encyclopedia.target.word-word.dim300
): The pre-trained embedding model name. Usepaddlenlp.embeddings.list_embedding_name()
to show which embedding model we have alreaady provide.(object –
str
, optional, default to[UNK]
): Specifying unknown token as unknown_token.(object – list, optional, default to
None
): To initialize the vector of unknown token. If it’s none, use normal distribution to initialize the vector of unknown token.(object –
str
, optional, default toNone
): The file path of extended vocabulary.(object –
bool
, optional, default to True): Whether the weight of embedding can be trained.(object –
bool
, optional, default to True): Whether keep the extended vocabulary only, will be effective only if provides extended_vocab_path
-
set_trainable
(trainable)[source]¶ Set the weight of embedding can be trained. :param trainable (object:
bool
, required):Whether the weight of embedding can be trained.
-
search
(words)[source]¶ Get the vectors of specifying words. :param words (object:
list
orstr
orint
, required): The words which need to be searched.- Returns
numpy.array
): The vectors of specifying words.- Return type
word_vector (object
-
get_idx_list_from_words
(words)[source]¶ Get the index list of specifying words by searching word_to_idx dict.