Point cloud utilities manual¶
File Readers¶

gudhi.
read_points_from_off_file
()¶ Read points from OFF file.
 Parameters
off_file¶ (string) – An OFF file style name.
 Returns
The point set.
 Return type
List[List[float]]

gudhi.
read_lower_triangular_matrix_from_csv_file
()¶ Read lower triangular matrix from a CSV style file.
Subsampling¶

gudhi.subsampling.
choose_n_farthest_points
()¶ Subsample by a greedy strategy of iteratively adding the farthest point from the current chosen point set to the subsampling. The iteration starts with the landmark starting point.
 Parameters
points¶ (Iterable[Iterable[float]]) – The input point set.
Or
 Parameters
off_file¶ (string) – An OFF file style name.
And in both cases
 Parameters
 Returns
The subsample point set.
 Return type
List[List[float]]

gudhi.subsampling.
pick_n_random_points
()¶ Subsample a point set by picking random vertices.
 Parameters
points¶ (Iterable[Iterable[float]]) – The input point set.
Or
 Parameters
off_file¶ (string) – An OFF file style name.
And in both cases
 Parameters
nb_points¶ (unsigned.) – Number of points of the subsample.
 Returns
The subsample point set.
 Return type
List[List[float]]

gudhi.subsampling.
sparsify_point_set
()¶ Outputs a subset of the input points so that the squared distance between any two points is greater than or equal to min_squared_dist.
 Parameters
points¶ (Iterable[Iterable[float]]) – The input point set.
Or
 Parameters
off_file¶ (string) – An OFF file style name.
And in both cases
 Parameters
min_squared_dist¶ (float.) – Minimum squared distance separating the output points.
 Returns
The subsample point set.
 Return type
List[List[float]]
Time Delay Embedding¶

class
gudhi.point_cloud.timedelay.
TimeDelayEmbedding
(dim=3, delay=1, skip=1)[source]¶ Point cloud transformation class. Embeds timeseries data in the R^d according to Takens’ Embedding Theorem and obtains the coordinates of each point.
 Parameters
Example
Given delay=3 and skip=2, a point cloud which is obtained by embedding a scalar timeseries into R^3 is as follows:
timeseries = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] point cloud = [[1, 4, 7], [3, 6, 9]]
Given delay=1 and skip=1, a point cloud which is obtained by embedding a 2D vector timeseries data into R^4 is as follows:
timeseries = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]] point cloud = [[0, 1, 2, 3], [2, 3, 4, 5], [4, 5, 6, 7], [6, 7, 8, 9]]

__call__
(ts)[source]¶ Transform method for single timeseries data.
 Parameters
ts¶ (Iterable[float] or Iterable[Iterable[float]]) – A single timeseries data, with scalar or vector values.
 Returns
point cloud – Makes point cloud from a single timeseries data.
 Return type
n x dim numpy arrays

transform
(ts)[source]¶ Transform method for multiple timeseries data.
 Parameters
ts¶ (Iterable[Iterable[float]] or Iterable[Iterable[Iterable[float]]]) – Multiple timeseries data, with scalar or vector values.
 Returns
point clouds – Makes point cloud from each timeseries data.
 Return type
list of n x dim numpy arrays
K nearest neighbors¶

class
gudhi.point_cloud.knn.
KNearestNeighbors
(k, return_index=True, return_distance=False, metric='euclidean', **kwargs)[source]¶ Class wrapping several implementations for computing the k nearest neighbors in a point set.
 Requires
PyKeOps, SciPy, Scikitlearn, and/or Hnswlib in function of the selected implementation.

__init__
(k, return_index=True, return_distance=False, metric='euclidean', **kwargs)[source]¶  Parameters
k¶ (int) – number of neighbors (possibly including the point itself).
return_index¶ (bool) – if True, return the index of each neighbor.
return_distance¶ (bool) – if True, return the distance to each neighbor.
implementation¶ (str) –
choice of the library that does the real work.
’keops’ for a bruteforce, CUDA implementation through pykeops. Useful when the dimension becomes large (10+) but the number of points remains low (less than a million). Only “minkowski” and its aliases are supported.
’ckdtree’ for scipy’s cKDTree. Only “minkowski” and its aliases are supported.
’sklearn’ for scikitlearn’s NearestNeighbors. Note that this provides in particular an option algorithm=”brute”.
’hnsw’ for hnswlib.Index. It can be very fast but does not provide guarantees. Only supports “euclidean” for now.
None will try to select a sensible one (scipy if possible, scikitlearn otherwise).
metric¶ (str) – see sklearn.neighbors.NearestNeighbors.
eps¶ (float) – relative error when computing nearest neighbors with the cKDTree.
p¶ (float) – norm L^p on input points (including numpy.inf) if metric is “minkowski”. Defaults to 2.
n_jobs¶ (int) – number of jobs to schedule for parallel processing of nearest neighbors on the CPU. If 1 is given all processors are used. Default: 1.
sort_results¶ (bool) – if True, then distances and indices of each point are sorted on return, so that the first column contains the closest points. Otherwise, neighbors are returned in an arbitrary order. Defaults to True.
enable_autodiff¶ (bool) – if the input is a torch.tensor or tensorflow.Tensor, this instructs the function to compute distances in a way that works with automatic differentiation. This is experimental, not supported for all metrics, and requires the package EagerPy. Defaults to False.
kwargs¶ – additional parameters are forwarded to the backends.

transform
(X)[source]¶  Parameters
X¶ (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”.
 Returns
if return_index, an array of shape (len(X), k) with the indices (in the argument of
fit()
) of the k nearest neighbors to the points of X. If return_distance, an array of the same shape with the distances to those neighbors. If both, a tuple with the two arrays, in this order. Return type
numpy.array
Distance to measure¶

class
gudhi.point_cloud.dtm.
DTMDensity
(k=None, weights=None, q=None, dim=None, normalize=False, n_samples=None, **kwargs)[source]¶ Density estimator based on the distance to the empirical measure defined by a point set, as defined in [2]. Note that this implementation only renormalizes when asked, and the renormalization only works for a Euclidean metric, so in other cases the total measure may not be 1.
Note
When the dimension is high, using it as an exponent can quickly lead to under or overflows. We recommend using a small fixed value instead in those cases, even if it won’t have the same nice theoretical properties as the dimension.

__init__
(k=None, weights=None, q=None, dim=None, normalize=False, n_samples=None, **kwargs)[source]¶  Parameters
k¶ (int) – number of neighbors (possibly including the point itself). Optional if it can be guessed from weights or metric=”neighbors”.
weights¶ (numpy.array) – weights of each of the k neighbors, optional. They are supposed to sum to 1.
q¶ (float) – order used to compute the distance to measure. Defaults to dim.
dim¶ (float) – final exponent representing the dimension. Defaults to the dimension, and must be specified when the dimension cannot be read from the input (metric is “neighbors” or “precomputed”).
normalize¶ (bool) – normalize the density so it corresponds to a probability measure on ℝᵈ. Only available for the Euclidean metric, defaults to False.
n_samples¶ (int) – number of sample points used for fitting. Only needed if normalize is True and metric is “neighbors”.
kwargs¶ – same parameters as
KNearestNeighbors
, except that metric=”neighbors” means thattransform()
expects an array with the distances to the k nearest neighbors.


class
gudhi.point_cloud.dtm.
DistanceToMeasure
(k, q=2, **kwargs)[source]¶ Class to compute the distance to the empirical measure defined by a point set, as introduced in [14].

__init__
(k, q=2, **kwargs)[source]¶  Parameters
k¶ (int) – number of neighbors (possibly including the point itself).
q¶ (float) – order used to compute the distance to measure. Defaults to 2.
kwargs¶ – same parameters as
KNearestNeighbors
, except that metric=”neighbors” means thattransform()
expects an array with the distances to the k nearest neighbors.

transform
(X)[source]¶  Parameters
X¶ (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”, or distances to the k nearest neighbors if metric is “neighbors” (if the array has more than k columns, the remaining ones are ignored).
 Returns
a 1d array with, for each point of X, its distance to the measure defined by the argument of
fit()
. Return type
numpy.array
