graphdot.model.gaussian_field.gfr module

class graphdot.model.gaussian_field.gfr.GaussianFieldRegressor(weight, optimizer=None, smoothing=0.001)[source]

Bases: object

Semi-supervised learning and prediction of missing labels of continuous value on a graph. Reference: Zhu, Ghahramani, Lafferty. ICML 2003

Parameters:
  • weight (callable or 'precomputed') – A function that implements a weight function that converts distance matrices to weight matrices. The value of a weight function should generally decay with distance. If weight is ‘precomputed’, then the result returned by metric will be directly used as weight.
  • optimizer (one of (str, True, None, callable)) – A string or callable that represents one of the optimizers usable in the scipy.optimize.minimize method. if None, no hyperparameter optimization will be carried out in fitting. If True, the optimizer will default to L-BFGS-B.
  • smoothing (float in [0, 1)) – Controls the strength of regularization via the smoothing of the transition matrix.
average_label_entropy(X, y, theta=None, eval_gradient=False, verbose=False)[source]

Evaluate the average label entropy of the Gaussian field model on a dataset.

Parameters:
  • X (2D array or list of objects) – Feature vectors or other generic representations of input data.
  • y (1D array) – Label of each data point. Values of None or NaN indicates missing labels that will be filled in by the model.
  • theta (1D array) – Hyperparameters for the weight class.
  • eval_gradients – Whether or not to evaluate the gradient of the average label entropy with respect to weight hyperparameters.
  • verbose (bool) – If true, print out some additional information as a markdown table.
Returns:

  • average_label_entropy (float) – The average label entropy of the Gaussian field prediction on the unlabeled nodes.
  • grad (1D array) – Gradient with respect to the hyperparameters.

fit(X, y, loss='loocv2', tol=1e-05, repeat=1, theta_jitter=1.0, verbose=False)[source]

Train the Gaussian field model.

Parameters:
  • X (2D array or list of objects) – Feature vectors or other generic representations of input data.
  • y (1D array) – Label of each data point. Values of None or NaN indicates missing labels that will be filled in by the model.
  • loss (str) –

    The loss function to be used to optimizing the hyperparameters. Options are:

    • ’ale’ or ‘average-label-entropy’: average label entropy. Only

    works if the labels are 0/1 binary. - ‘loocv1’ or ‘loocv2’: the leave-one-out cross validation of the labeled samples as measured in L1/L2 norm.

  • tol (float) – Tolerance for termination.
  • repeat (int) – Repeat the hyperparameter optimization by the specified number of times and return the best result.
  • theta_jitter (float) – Standard deviation of the random noise added to the initial logscale hyperparameters across repeated optimization runs.
Returns:

self – returns an instance of self.

Return type:

GaussianFieldRegressor

fit_predict(X, y, loss='average-label-entropy', tol=1e-05, repeat=1, theta_jitter=1.0, return_influence=False, verbose=False)[source]

Train the Gaussian field model and make predictions for the unlabeled nodes.

Parameters:
  • X (2D array or list of objects) – Feature vectors or other generic representations of input data.
  • y (1D array) – Label of each data point. Values of None or NaN indicates missing labels that will be filled in by the model.
  • loss (str) –

    The loss function to be used to optimizing the hyperparameters. Options are:

    • ’ale’ or ‘average-label-entropy’: average label entropy. Only

    works if the labels are 0/1 binary. - ‘loocv1’ or ‘loocv2’: the leave-one-out cross validation of the labeled samples as measured in L1/L2 norm.

  • tol (float) – Tolerance for termination.
  • repeat (int) – Repeat the hyperparameter optimization by the specified number of times and return the best result.
  • theta_jitter (float) – Standard deviation of the random noise added to the initial logscale hyperparameters across repeated optimization runs.
  • return_influence (bool) – If True, also returns the contributions of each labeled sample to each predicted label as an ‘influence matrix’.
Returns:

  • z (1D array) – Node labels with missing ones filled in by prediction.
  • influence_matrix (2D array) – Contributions of each labeled sample to each predicted label. Only returned if return_influence is True.

loocv_error(X, y, p=2, theta=None, eval_gradient=False, verbose=False)[source]

Evaluate the leave-one-out cross validation error and gradient.

Parameters:
  • X (2D array or list of objects) – Feature vectors or other generic representations of input data.
  • y (1D array) – Label of each data point. Values of None or NaN indicates missing labels that will be filled in by the model.
  • p (float > 1) – The order of the p-norm for LOOCV error.
  • theta (1D array) – Hyperparameters for the weight class.
  • eval_gradients – Whether or not to evaluate the gradient of the average label entropy with respect to weight hyperparameters.
  • verbose (bool) – If true, print out some additional information as a markdown table.
Returns:

  • err (1D array) – LOOCV Error
  • grad (1D array) – Gradient with respect to the hyperparameters.

loocv_error_1(X, y, **kwargs)[source]

Leave-one-out cross validation error measured in L1 norm. Equivalent to :py:method:`loocv_error(X, y, p=1, **kwargs)`.

loocv_error_2(X, y, **kwargs)[source]

Leave-one-out cross validation error measured in L2 norm. Equivalent to :py:method:`loocv_error(X, y, p=2, **kwargs)`.

predict(X, y, return_influence=False)[source]

Make predictions for the unlabeled elements in y.

Parameters:
  • X (2D array or list of objects) – Feature vectors or other generic representations of input data.
  • y (1D array) – Label of each data point. Values of None or NaN indicates missing labels that will be filled in by the model.
  • return_influence (bool) – If True, also returns the contributions of each labeled sample to each predicted label as an ‘influence matrix’.
Returns:

  • z (1D array) – Node labels with missing ones filled in by prediction.
  • influence_matrix (2D array) – Contributions of each labeled sample to each predicted label. Only returned if return_influence is True.