graphdot.model.active_learning.hierarchical_drafter module

class graphdot.model.active_learning.hierarchical_drafter.HierarchicalDrafter(selector, k=2, a=2, leaf_ratio='auto')[source]

Bases: object

Hierarhically select representative samples from a large dataset where a direct algorithm can be prohibitively expensive.

Parameters:
  • selector (callable) – A selection algorithm that can pick a given number of samples from a dataset to maximize a certain acquisition function.
  • k (int > 1) – The branching factor of the search hierarchy.
  • a (float in (1, k]) – The multiplier to the number of samples that each level need to generate during hierarchical screening. For example, if n samples are wanted in the end, then the immediate next level should forward at least m * n samples for the last level drafter to choose from.
  • leaf_ratio (float in (0, 1)) – If ratio berween output and input samples is greater than it, stop further division and carry out selection using the given selector.
__call__(X, n, random_state=None, verbose=False)[source]

Find a n-sample subset of X that attempts to maximize a certain diversity criterion.

Parameters:
  • X (feature matrix or list of objects) – Input dataset.
  • n (int) – The size of the subset to be chosen.
  • random_state (int or :py:`np.random.Generator`) – The seed to the random number generator (RNG), or the RNG itself. If None, the default RNG in numpy will be used.
Returns:

chosen – A sorted list of indices of the samples that are chosen.

Return type:

list