yahoo_answer_100k

class YahooAnswer100K(lazy=None, name=None, **config)[source]

Bases: paddlenlp.datasets.dataset.DatasetBuilder

The data is from https://arxiv.org/pdf/1702.08139.pdf, which samples 100k documents from original Yahoo Answer data, and vocabulary size is 200k.

class META_INFO(file, md5)

Bases: tuple

property file

Alias for field number 0

property md5

Alias for field number 1

get_vocab()[source]

Return vocab file path of the dataset if specified.