graphdot.kernel package

class graphdot.kernel.Tang2019MolecularKernel(stopping_probability=0.01, starting_probability=1.0, element_prior=0.2, edge_length_scale=0.05, **kwargs)[source]

Bases: object

A margianlized graph kernel for 3D molecular structures as in: Tang, Y. H., & de Jong, W. A. (2019). Prediction of atomization energy using graph kernel and active learning. The Journal of chemical physics, 150(4), 044107. The kernel can be directly used together with Graph.from_ase() to operate on molecular structures.

Parameters:
  • stopping_probability (float in (0, 1)) – The probability for the random walk to stop during each step.
  • starting_probability (float) – The probability for the random walk to start from any node. See the p kwarg of graphdot.kernel.marginalized.MarginalizedGraphKernel
  • element_prior (float in (0, 1)) – The baseline similarity between distinct elements — an element always have a similarity 1 to itself.
  • edge_length_scale (float in (0, inf)) – length scale of the Gaussian kernel on edge length. A rule of thumb is that the similarity decays smoothly from 1 to nearly 0 around three times of the length scale.
__call__(X, Y=None, **kwargs)[source]

Same call signature as graphdot.kernel.marginalized.MarginalizedGraphKernel.__call__()

bounds
clone_with_theta(theta)[source]
diag(X, **kwargs)[source]

Same call signature as graphdot.kernel.marginalized.MarginalizedGraphKernel.diag()

hyperparameter_bounds
hyperparameters
theta
class graphdot.kernel.KernelOverMetric(distance, expr, x, **hyperparameters)[source]

Bases: object

__call__(X, Y=None, eval_gradient=False)[source]

Call self as a function.

bounds
clone_with_theta(theta=None)[source]
diag(X)[source]
get_params()[source]
hyperparameters
theta
class graphdot.kernel.MarginalizedGraphKernel(node_kernel, edge_kernel, p=1.0, q=0.01, q_bounds=(0.0001, 0.9999), eps=0.01, ftol=1e-08, gtol=1e-06, dtype=<class 'float'>, backend='auto')[source]

Bases: object

Implements the random walk-based graph similarity kernel as proposed in: Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 321-328).

Parameters:
  • node_kernel (microkernel) – A kernelet that computes the similarity between individual nodes
  • edge_kernel (microkernel) – A kernelet that computes the similarity between individual edge
  • p (positive number (default=1.0) or StartingProbability) – The starting probability of the random walk on each node. Must be either a positive number or a concrete subclass instance of StartingProbability.
  • q (float in (0, 1)) – The probability for the random walk to stop during each step.
  • q_bounds (pair of floats) – The lower and upper bound that the stopping probability can vary during hyperparameter optimization.
  • eps (float) – The step size used for finite difference approximation of the gradient. Only used for nodal matrices (nodal=True).
  • dtype (numpy dtype) – The data type of the kernel matrix to be returned.
  • backend ('auto' or 'cuda' or an instance of) –
:param graphdot.kernel.marginalized.Backend.: The computing engine that solves the marginalized graph kernel’s
generalized Laplacian equation.
__call__(X, Y=None, eval_gradient=False, nodal=False, lmin=0, timing=False)[source]

Compute pairwise similarity matrix between graphs

Parameters:
  • X (list of N graphs) – The graphs must all have same node and edge attributes.
  • Y (None or list of M graphs) – The graphs must all have same node and edge attributes.
  • eval_gradient (Boolean) – If True, computes the gradient of the kernel matrix with respect to hyperparameters and return it alongside the kernel matrix.
  • nodal (bool) – If True, return node-wise similarities; otherwise, return graphwise similarities.
  • lmin (0 or 1) – Number of steps to skip in each random walk path before similarity is computed. (lmin + 1) corresponds to the starting value of l in the summation of Eq. 1 in Tang & de Jong, 2019 https://doi.org/10.1063/1.5078640 (or the first unnumbered equation in Section 3.3 of Kashima, Tsuda, and Inokuchi, 2003).
Returns:

  • kernel_matrix (ndarray) – if Y is None, return a square matrix containing pairwise similarities between the graphs in X; otherwise, returns a matrix containing similarities across graphs in X and Y.
  • gradient (ndarray) – The gradient of the kernel matrix with respect to kernel hyperparameters. Only returned if eval_gradient is True.

active_theta_mask
bounds

The logarithms of a reshaped X-by-2 array of kernel hyperparameter bounds, excluing those declared as ‘fixed’ or those with equal lower and upper bounds.

clone_with_theta(theta)[source]
diag(X, eval_gradient=False, nodal=False, lmin=0, active_theta_only=True, timing=False)[source]

Compute the self-similarities for a list of graphs

Parameters:
  • X (list of N graphs) – The graphs must all have same node attributes and edge attributes.
  • eval_gradient (Boolean) – If True, computes the gradient of the kernel matrix with respect to hyperparameters and return it alongside the kernel matrix.
  • nodal (bool) – If True, returns a vector containing nodal self similarties; if False, returns a vector containing graphs’ overall self similarities; if ‘block’, return a list of square matrices which forms a block-diagonal matrix, where each diagonal block represents the pairwise nodal similarities within a graph.
  • lmin (0 or 1) – Number of steps to skip in each random walk path before similarity is computed. (lmin + 1) corresponds to the starting value of l in the summation of Eq. 1 in Tang & de Jong, 2019 https://doi.org/10.1063/1.5078640 (or the first unnumbered equation in Section 3.3 of Kashima, Tsuda, and Inokuchi, 2003).
  • active_theta_only (bool) – Whether or not to return only gradients with regard to the non-fixed hyperparameters.
Returns:

  • diagonal (numpy.array or list of np.array(s)) – If nodal=True, returns a vector containing nodal self similarties; if nodal=False, returns a vector containing graphs’ overall self similarities; if nodal = ‘block’, return a list of square matrices, each being a pairwise nodal similarity matrix within a graph.
  • gradient – The gradient of the kernel matrix with respect to kernel hyperparameters. Only returned if eval_gradient is True.

flat_hyperparameters
hyperparameter_bounds
hyperparameters

A hierarchical representation of all the kernel hyperparameters.

is_stationary()[source]
n_dims

Number of hyperparameters including both optimizable and fixed ones.

requires_vector_input
theta

The logarithms of a flattened array of kernel hyperparameters, excluing those declared as ‘fixed’ or those with equal lower and upper bounds.

trait_t

alias of Traits

classmethod traits(diagonal=False, symmetric=False, nodal=False, lmin=0, eval_gradient=False)[source]