graphdot.model.active_learning.variance_minimizer module

class graphdot.model.active_learning.variance_minimizer.VarianceMinimizer(kernel, alpha=1e-06, kernel_options=None)[source]

Bases: object

Select a subset of a dataset such that the Gaussian process posterior variance, i.e. the Nystrom residual norm, of the kernel matrix of the UNSELECTED samples are as small as possible. In other words, the objective here is to ensure that the chosen samples can effectively span the vector space as occupied by the entire dataset in a reproducible kernel Hilbert space (RKHS).

Parameters:
  • kernel (callable or 'precomputed') – A symmetric positive semidefinite function implemented via the __call__ semantics. Alternatively, if the value is ‘precomputed’, a square kernel matrix will be expected as an argument to :py:`__call__`.
  • alpha (float, default=1e-7) – A small value added to the diagonal elements of the kernel matrix in order to regularize the variance calculations.
  • kernel_options (dict) – Additional arguments to be passed into the kernel.
__call__(X, n)[source]

Find a n-sample subset of X that attempts to maximize the diversity and return the indices of the samples.

Parameters:
  • X (feature matrix or list of objects) – Input dataset.
  • n (int) – Number of samples to be chosen.
Returns:

chosen – Indices of the samples that are chosen.

Return type:

list