token_embedding

list_embedding_name()[source]

List all names of pretrained embedding models paddlenlp provides.

class TokenEmbedding(embedding_name='w2v.baidu_encyclopedia.target.word-word.dim300', unknown_token='[UNK]', unknown_token_vector=None, extended_vocab_path=None, trainable=True, keep_extended_vocab_only=False)[source]

Bases: paddle.nn.layer.common.Embedding

A TokenEmbedding can load pre-trained embedding model which paddlenlp provides by specifying embedding name. Furthermore, a TokenEmbedding can load extended vocabulary by specifying extended_vocab_path.

Parameters
  • (object (keep_extended_vocab_only) – str, optional, default to w2v.baidu_encyclopedia.target.word-word.dim300): The pre-trained embedding model name. Use paddlenlp.embeddings.list_embedding_name() to show which embedding model we have alreaady provide.

  • (objectstr, optional, default to [UNK]): Specifying unknown token as unknown_token.

  • (object – list, optional, default to None): To initialize the vector of unknown token. If it’s none, use normal distribution to initialize the vector of unknown token.

  • (objectstr, optional, default to None): The file path of extended vocabulary.

  • (objectbool, optional, default to True): Whether the weight of embedding can be trained.

  • (objectbool, optional, default to True): Whether keep the extended vocabulary only, will be effective only if provides extended_vocab_path

set_trainable(trainable)[source]

Set the weight of embedding can be trained. :param trainable (object: bool, required):

Whether the weight of embedding can be trained.

search(words)[source]

Get the vectors of specifying words. :param words (object: list or str or int, required): The words which need to be searched.

Returns

numpy.array): The vectors of specifying words.

Return type

word_vector (object

get_idx_from_word(word)[source]

Get the index of specifying word by searching word_to_idx dict.

get_idx_list_from_words(words)[source]

Get the index list of specifying words by searching word_to_idx dict.

dot(word_a, word_b)[source]

Calculate the scalar product of 2 words. :param word_a (object: str, required): The first word string. :param word_b (object: str, required): The second word string.

Returns

The scalar product of 2 words.

cosine_sim(word_a, word_b)[source]

Calculate the cosine similarity of 2 words. :param word_a (object: str, required): The first word string. :param word_b (object: str, required): The second word string.

Returns

The cosine similarity of 2 words.