token_embedding¶

list_embedding_name()[source]¶: List all names of pretrained embedding models paddlenlp provides.

class TokenEmbedding(embedding_name='w2v.baidu_encyclopedia.target.word-word.dim300', unknown_token='[UNK]', unknown_token_vector=None, extended_vocab_path=None, trainable=True, keep_extended_vocab_only=False)[source]¶

Bases: paddle.nn.layer.common.Embedding

A TokenEmbedding can load pre-trained embedding model which paddlenlp provides by specifying embedding name. Furthermore, a TokenEmbedding can load extended vocabulary by specifying extended_vocab_path.

Parameters

(object (keep_extended_vocab_only) – str, optional, default to w2v.baidu_encyclopedia.target.word-word.dim300): The pre-trained embedding model name. Use paddlenlp.embeddings.list_embedding_name() to show which embedding model we have alreaady provide.
(object – str, optional, default to [UNK]): Specifying unknown token as unknown_token.
(object – list, optional, default to None): To initialize the vector of unknown token. If it’s none, use normal distribution to initialize the vector of unknown token.
(object – str, optional, default to None): The file path of extended vocabulary.
(object – bool, optional, default to True): Whether the weight of embedding can be trained.
(object – bool, optional, default to True): Whether keep the extended vocabulary only, will be effective only if provides extended_vocab_path

set_trainable(trainable)[source]¶: Set the weight of embedding can be trained. :param trainable (object: bool, required):

Whether the weight of embedding can be trained.

search(words)[source]¶

Get the vectors of specifying words. :param words (object: list or str or int, required): The words which need to be searched.

Returns: numpy.array): The vectors of specifying words.
Return type: word_vector (object

get_idx_from_word(word)[source]¶: Get the index of specifying word by searching word_to_idx dict.

get_idx_list_from_words(words)[source]¶: Get the index list of specifying words by searching word_to_idx dict.

dot(word_a, word_b)[source]¶

Calculate the scalar product of 2 words. :param word_a (object: str, required): The first word string. :param word_b (object: str, required): The second word string.

Returns: The scalar product of 2 words.

cosine_sim(word_a, word_b)[source]¶

Calculate the cosine similarity of 2 words. :param word_a (object: str, required): The first word string. :param word_b (object: str, required): The second word string.

Returns: The cosine similarity of 2 words.