sampler¶
-
class
SamplerHelper
(dataset, iterable=None)[source]¶ Bases:
object
SamplerHelper is to help construct iterable sampler used for
DataLoader
. It wraps a dataset and uses its__getitem__
Every SamplerHelper subclass has to provide an__iter__()
method, providing a way to iterate over indices of dataset elements, and a__len__()
method that returns the length of the returned iterators. Also can be used as batch iterator instead of indices iterator wheniterator
yield samples rather than indices by initializingiterator
with a iterable dataset. .. note:: The__len__()
method isn’t strictly required byDataLoader
, but is expected in any calculation involving the length of aDataLoader
.- Parameters
dataset (Dataset) – Input dataset for SamplerHelper.
iterable (collections.Iterable|callable, optional) – Iterator of dataset. Default: None.
-
property
length
¶ Returns: the length of the SamplerHelper.
-
apply
(fn)[source]¶ Transformations would be performed. It includes
Shuffle
,sort
,fit
andshard
. :param fn: Transformations to be performed. It returns transformed iterable (and data_source). :type fn: callable- Returns
A new transformed object.
- Return type
-
shuffle
(buffer_size=- 1, seed=None)[source]¶ Shuffle the dataset according to the given buffer size and random seed. :param buffer_size: Buffer size for shuffle. if buffer_size < 0 or more than the length of the dataset,
buffer_size is the length of the dataset. Default: -1.
- Parameters
seed (int, optional) – Seed for the random. Default: None.
- Returns
SamplerHelper
-
sort
(cmp=None, key=None, reverse=False, buffer_size=- 1)[source]¶ Sort samples according to given callable cmp or key. :param cmp: The function of comparison. Default: None. :type cmp: callable :param key: Return element to be compared. Default: None. :type key: callable :param reverse: If True, it means in descending order, and False means in ascending order. Default: False. :type reverse: bool :param buffer_size: Buffer size for sort. If buffer_size < 0 or buffer_size is more than the length of the data,
buffer_size will be set to the length of the data. Default: -1.
- Returns
SamplerHelper
-
batch
(batch_size, drop_last=False, batch_size_fn=None, key=None)[source]¶ To produce a BatchSampler. :param batch_size: Batch size. :type batch_size: int :param drop_last: Whether to drop the last mini batch. Default:
False.
- Parameters
batch_size_fn (callable, optional) – It accepts four arguments: index of data source, the length of minibatch, the size of minibatch so far and data source, and it returns the size of mini batch so far. Actually, the returned value can be anything and would used as argument size_so_far in
key
. If None, it would return the length of mini match. Default: None.key (callable, optional) – It accepts the size of minibatch so far and the length of minibatch, and returns what to be compared with
batch_size
. If None, only the size of mini batch so far would be compared withbatch_size
. Default: None.
- Returns
SamplerHelper
-
shard
(num_replicas=None, rank=None)[source]¶ Operates slice using multi GPU. :param num_replicas: The number of training process, and is also the number of GPU cards used in training.
Default: None.
- Parameters
rank (int, optional) – Number of training process. Equal to the value of the environment variable PADDLE_TRAINER_ID. Default: None.
- Returns
SamplerHelper