Point cloud utilities manual¶

File Readers¶

gudhi.read_points_from_off_file()¶

Read points from OFF file.

Parameters: off_file¶ (string) – An OFF file style name.
Returns: The point set.
Return type: List[List[float]]

gudhi.read_lower_triangular_matrix_from_csv_file()¶

Read lower triangular matrix from a CSV style file.

Parameters

csv_file¶ (string) – A CSV file style name.
separator¶ (char) – The value separator in the CSV file. Default value is ‘;’

Returns

The lower triangular matrix.

Return type

List[List[float]]

Subsampling¶

Requires: Eigen \(\geq\) 3.1.0 and CGAL \(\geq\) 4.11.0

gudhi.subsampling.choose_n_farthest_points()¶

Subsample by a greedy strategy of iteratively adding the farthest point from the current chosen point set to the subsampling. The iteration starts with the landmark starting point.

Parameters: points¶ (Iterable[Iterable[float]]) – The input point set.

Or

Parameters: off_file¶ (string) – An OFF file style name.

And in both cases

Parameters

nb_points¶ (unsigned.) – Number of points of the subsample.
starting_point¶ (unsigned.) – The iteration starts with the landmark starting point,which is the index of the point to start with. If not set, this index is chosen randomly.

Returns

The subsample point set.

Return type

List[List[float]]

gudhi.subsampling.pick_n_random_points()¶

Subsample a point set by picking random vertices.

Parameters: points¶ (Iterable[Iterable[float]]) – The input point set.

Or

Parameters: off_file¶ (string) – An OFF file style name.

And in both cases

Parameters: nb_points¶ (unsigned.) – Number of points of the subsample.
Returns: The subsample point set.
Return type: List[List[float]]

gudhi.subsampling.sparsify_point_set()¶

Outputs a subset of the input points so that the squared distance between any two points is greater than or equal to min_squared_dist.

Parameters: points¶ (Iterable[Iterable[float]]) – The input point set.

Or

Parameters: off_file¶ (string) – An OFF file style name.

And in both cases

Parameters: min_squared_dist¶ (float.) – Minimum squared distance separating the output points.
Returns: The subsample point set.
Return type: List[List[float]]

Time Delay Embedding¶

class gudhi.point_cloud.timedelay.TimeDelayEmbedding(dim=3, delay=1, skip=1)[source]¶

Point cloud transformation class. Embeds time-series data in the R^d according to Takens’ Embedding Theorem and obtains the coordinates of each point.

Parameters

dim¶ (int, optional (default=3)) – d of R^d to be embedded.
delay¶ (int, optional (default=1)) – Time-Delay embedding.
skip¶ (int, optional (default=1)) – How often to skip embedded points.

Example

Given delay=3 and skip=2, a point cloud which is obtained by embedding a scalar time-series into R^3 is as follows:

time-series = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
point cloud = [[1, 4, 7],
               [3, 6, 9]]

Given delay=1 and skip=1, a point cloud which is obtained by embedding a 2D vector time-series data into R^4 is as follows:

time-series = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
point cloud = [[0, 1, 2, 3],
               [2, 3, 4, 5],
               [4, 5, 6, 7],
               [6, 7, 8, 9]]

__call__(ts)[source]¶

Transform method for single time-series data.

Parameters: ts¶ (Iterable[float] or Iterable[Iterable[float]]) – A single time-series data, with scalar or vector values.
Returns: point cloud – Makes point cloud from a single time-series data.
Return type: n x dim numpy arrays

transform(ts)[source]¶

Transform method for multiple time-series data.

Parameters: ts¶ (Iterable[Iterable[float]] or Iterable[Iterable[Iterable[float]]]) – Multiple time-series data, with scalar or vector values.
Returns: point clouds – Makes point cloud from each time-series data.
Return type: list of n x dim numpy arrays

K nearest neighbors¶

class gudhi.point_cloud.knn.KNearestNeighbors(k, return_index=True, return_distance=False, metric='euclidean', **kwargs)[source]¶

Class wrapping several implementations for computing the k nearest neighbors in a point set.

Requires: PyKeOps, SciPy, Scikit-learn, and/or Hnswlib in function of the selected implementation.

__init__(k, return_index=True, return_distance=False, metric='euclidean', **kwargs)[source]¶

Parameters

k¶ (int) – number of neighbors (possibly including the point itself).
return_index¶ (bool) – if True, return the index of each neighbor.
return_distance¶ (bool) – if True, return the distance to each neighbor.
implementation¶ (str) –
choice of the library that does the real work.
- ’keops’ for a brute-force, CUDA implementation through pykeops. Useful when the dimension becomes large (10+) but the number of points remains low (less than a million). Only “minkowski” and its aliases are supported.
- ’ckdtree’ for scipy’s cKDTree. Only “minkowski” and its aliases are supported.
- ’sklearn’ for scikit-learn’s NearestNeighbors. Note that this provides in particular an option algorithm=”brute”.
- ’hnsw’ for hnswlib.Index. It can be very fast but does not provide guarantees. Only supports “euclidean” for now.
- None will try to select a sensible one (scipy if possible, scikit-learn otherwise).
metric¶ (str) – see sklearn.neighbors.NearestNeighbors.
eps¶ (float) – relative error when computing nearest neighbors with the cKDTree.
p¶ (float) – norm L^p on input points (including numpy.inf) if metric is “minkowski”. Defaults to 2.
n_jobs¶ (int) – number of jobs to schedule for parallel processing of nearest neighbors on the CPU. If -1 is given all processors are used. Default: 1.
sort_results¶ (bool) – if True, then distances and indices of each point are sorted on return, so that the first column contains the closest points. Otherwise, neighbors are returned in an arbitrary order. Defaults to True.
enable_autodiff¶ (bool) – if the input is a torch.tensor, jax.numpy.ndarray or tensorflow.Tensor, this instructs the function to compute distances in a way that works with automatic differentiation. This is experimental, not supported for all metrics, and requires the package EagerPy. Defaults to False.
kwargs¶ – additional parameters are forwarded to the backends.

fit(X, y=None)[source]¶

Parameters: X¶ (numpy.array) – coordinates for reference points.

fit_transform(X, y=None)[source]¶

transform(X)[source]¶

Parameters: X¶ (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”.
Returns: if return_index, an array of shape (len(X), k) with the indices (in the argument of fit()) of the k nearest neighbors to the points of X. If return_distance, an array of the same shape with the distances to those neighbors. If both, a tuple with the two arrays, in this order.
Return type: numpy.array

Distance to measure¶

class gudhi.point_cloud.dtm.DistanceToMeasure(k, q=2, **kwargs)[source]¶

Class to compute the distance to the empirical measure defined by a point set, as introduced in [11].

__init__(k, q=2, **kwargs)[source]¶

Parameters

k¶ (int) – number of neighbors (possibly including the point itself).
q¶ (float) – order used to compute the distance to measure. Defaults to 2.
kwargs¶ – same parameters as KNearestNeighbors, except that metric=”neighbors” means that transform() expects an array with the distances to the k nearest neighbors.

fit(X, y=None)[source]¶

Parameters: X¶ (numpy.array) – coordinates for mass points.

fit_transform(X, y=None)[source]¶

transform(X)[source]¶

Parameters: X¶ (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”, or distances to the k nearest neighbors if metric is “neighbors” (if the array has more than k columns, the remaining ones are ignored).
Returns: a 1-d array with, for each point of X, its distance to the measure defined by the argument of fit().
Return type: numpy.array

Point cloud utilities manual¶

File Readers¶

Subsampling¶

Time Delay Embedding¶

K nearest neighbors¶

Distance to measure¶

Table of Contents

This Page