graphdot.model.active_learning package

class graphdot.model.active_learning.HierarchicalDrafter(selector, k=2, a=2, leaf_ratio='auto')[source]

Bases: object

Hierarhically select representative samples from a large dataset where a direct algorithm can be prohibitively expensive.

Parameters:
  • selector (callable) – A selection algorithm that can pick a given number of samples from a dataset to maximize a certain acquisition function.
  • k (int > 1) – The branching factor of the search hierarchy.
  • a (float in (1, k]) – The multiplier to the number of samples that each level need to generate during hierarchical screening. For example, if n samples are wanted in the end, then the immediate next level should forward at least m * n samples for the last level drafter to choose from.
  • leaf_ratio (float in (0, 1)) – If ratio berween output and input samples is greater than it, stop further division and carry out selection using the given selector.
__call__(X, n, random_state=None, verbose=False)[source]

Find a n-sample subset of X that attempts to maximize a certain diversity criterion.

Parameters:
  • X (feature matrix or list of objects) – Input dataset.
  • n (int) – The size of the subset to be chosen.
  • random_state (int or :py:`np.random.Generator`) – The seed to the random number generator (RNG), or the RNG itself. If None, the default RNG in numpy will be used.
Returns:

chosen – A sorted list of indices of the samples that are chosen.

Return type:

list

class graphdot.model.active_learning.DeterminantMaximizer(kernel, kernel_options=None)[source]

Bases: object

Select a subset of a dataset such that the determinant of the kernel matrix of the selected samples are as large as possible. In other words, the objective here is to ensure that the samples are as linearly independent as possible in a reproducible kernel Hilbert space (RKHS).

Parameters:
  • kernel (callable or 'precomputed') – A symmetric positive semidefinite function implemented via the __call__ semantics. Alternatively, if the value is ‘precomputed’, a square kernel matrix will be expected as an argument to :py:`__call__`.
  • kernel_options (dict) – Additional arguments to be passed into the kernel.
__call__(X, n)[source]

Find a n-sample subset of X that attempts to maximize the diversity and return the indices of the samples.

Parameters:
  • X (feature matrix or list of objects) – Input dataset.
  • n (int) – Number of samples to be chosen.
Returns:

chosen – Indices of the samples that are chosen.

Return type:

list

class graphdot.model.active_learning.VarianceMinimizer(kernel, alpha=1e-06, kernel_options=None)[source]

Bases: object

Select a subset of a dataset such that the Gaussian process posterior variance, i.e. the Nystrom residual norm, of the kernel matrix of the UNSELECTED samples are as small as possible. In other words, the objective here is to ensure that the chosen samples can effectively span the vector space as occupied by the entire dataset in a reproducible kernel Hilbert space (RKHS).

Parameters:
  • kernel (callable or 'precomputed') – A symmetric positive semidefinite function implemented via the __call__ semantics. Alternatively, if the value is ‘precomputed’, a square kernel matrix will be expected as an argument to :py:`__call__`.
  • alpha (float, default=1e-7) – A small value added to the diagonal elements of the kernel matrix in order to regularize the variance calculations.
  • kernel_options (dict) – Additional arguments to be passed into the kernel.
__call__(X, n)[source]

Find a n-sample subset of X that attempts to maximize the diversity and return the indices of the samples.

Parameters:
  • X (feature matrix or list of objects) – Input dataset.
  • n (int) – Number of samples to be chosen.
Returns:

chosen – Indices of the samples that are chosen.

Return type:

list