token_embedding¶
-
class
TokenEmbedding(embedding_name='w2v.baidu_encyclopedia.target.word-word.dim300', unknown_token='[UNK]', unknown_token_vector=None, extended_vocab_path=None, trainable=True, keep_extended_vocab_only=False)[source]¶ Bases:
paddle.nn.layer.common.EmbeddingA
TokenEmbeddingcan load pre-trained embedding model which paddlenlp provides by specifying embedding name. Furthermore, aTokenEmbeddingcan load extended vocabulary by specifying extended_vocab_path.- Parameters
(object (keep_extended_vocab_only) –
str, optional, default tow2v.baidu_encyclopedia.target.word-word.dim300): The pre-trained embedding model name. Usepaddlenlp.embeddings.list_embedding_name()to show which embedding model we have alreaady provide.(object –
str, optional, default to[UNK]): Specifying unknown token as unknown_token.(object – list, optional, default to
None): To initialize the vector of unknown token. If it’s none, use normal distribution to initialize the vector of unknown token.(object –
str, optional, default toNone): The file path of extended vocabulary.(object –
bool, optional, default to True): Whether the weight of embedding can be trained.(object –
bool, optional, default to True): Whether keep the extended vocabulary only, will be effective only if provides extended_vocab_path
-
set_trainable(trainable)[source]¶ Set the weight of embedding can be trained. :param trainable (object:
bool, required):Whether the weight of embedding can be trained.
-
search(words)[source]¶ Get the vectors of specifying words. :param words (object:
listorstrorint, required): The words which need to be searched.- Returns
numpy.array): The vectors of specifying words.- Return type
word_vector (object
-
get_idx_list_from_words(words)[source]¶ Get the index list of specifying words by searching word_to_idx dict.