Skip to content

KDTree

geoindex_rs.kdtree

ArrayLike module-attribute

ArrayLike = Union[ndarray, ArrowArrayExportable, Buffer]

A type alias for accepted input to the KDTreeBuilder.add method.

Accepted input includes numpy arrays, Arrow arrays, and C-contiguous buffer protocol input.

IndexLike module-attribute

IndexLike = Union[ndarray, ArrowArrayExportable, Buffer, KDTree]

A type alias for accepted input as a KDTree.

KDTree

Bases: Buffer

A fast, memory-efficient, zero-copy-compatible, 2D KDTree.

Use KDTreeBuilder, and then call finish to construct this.

This uses a binary-stable memory layout under the hood. So you can write the index to disk, then later load it and perform queries on it.

This class implements the Python buffer protocol, so you can pass it to the Python bytes constructor to copy the underlying binary memory into a Python bytes object.

KDTreeBuilder

A builder class to create a KDTree.

Example:

import numpy as np
from geoindex_rs import kdtree as kd

builder = kd.KDTreeBuilder(3)
x = np.arange(0, 3)
y = np.arange(2, 5)
builder.add(x, y)
tree = builder.finish()

results = kd.range(tree, 2, 4, 7, 9)
assert results.to_pylist() == [2]

__init__

__init__(
    num_items: int,
    node_size: int = 64,
    coord_type: Literal["float32", "float64"] = "float64",
) -> None

Initialize a KDTree with the given parameters.

This will provision memory for the KDTree, to be filled in the add method.

Parameters:

  • num_items (int) –

    The number of items in the tree

  • node_size (int, default: 64 ) –

    The node size of the tree. Defaults to 64.

  • coord_type (Literal['float32', 'float64'], default: 'float64' ) –

    The coordinate type to use in the tree. Currently only float32 and float64 are permitted. Defaults to "float64".

add

add(x: ArrayLike, y: ArrayLike | None = None) -> Array

Insert points in this KDTree.

There are multiple ways to call this method:

  • Two numpy or Arrow numeric arrays, passed as x, y.
  • A 2D numpy array, with shape (N, 2). It must be C-contiguous.
  • One argument with an Arrow array of type FixedSizeList or Struct. The FixedSizeListArray must have a list_size of 2, and the StructArray must have 2 children, ordered x and y.

In this case, all other parameters should be left as None.

Note

Most of the time, you should call add only once. This function is vectorized, and you should avoid calling it in a loop with a few rows of data at a time.

In some cases, it may be useful to call this in a loop, such as if the input geometries do not all fit in memory at once.

It's important to add arrays at a time. This should usually not be called in a loop.

Parameters:

  • x (ArrayLike) –

    array-like input

  • y (ArrayLike | None, default: None ) –

    array-like input. Defaults to None.

Returns:

  • Array

    An Arrow array with the insertion index of each element, which provides a lookup back into the original data.

    This can be converted to a pyarrow.Array by passing to pyarrow.array.

finish

finish() -> KDTree

Sort the internal index and convert this class to a KDTree instance.

Returns:

  • KDTree

    An immutable KDTree instance, which can be used for spatial queries.

KDTreeMetadata

Common metadata to describe a KDTree.

This can be used to know the number of items, node information, or total byte size of a KDTree.

Additionally, this can be used to know how much memory a KDTree would use with the given number of items and node size. A KDTree with 1 million items and a node size of 64 (the default) would take up 20 MiB.

from geoindex_rs import kdtree as kd

metadata = kd.KDTreeMetadata(num_items=1_000_000, node_size=64)
assert metadata.num_bytes == 20_000_008

node_size property

node_size: int

The maximum number of items per node.

num_bytes property

num_bytes: int

The number of bytes that a KDTree with this metadata would have.

num_items property

num_items: int

The number of items indexed in the tree.

__init__

__init__(
    num_items: int,
    node_size: int = 64,
    coord_type: Literal["float32", "float64"] = "float64",
) -> None

Create a new KDTreeMetadata given a number of items and node size.

Parameters:

  • num_items (int) –

    The number of items in the tree

  • node_size (int, default: 64 ) –

    The node size of the tree. Defaults to 16.

  • coord_type (Literal['float32', 'float64'], default: 'float64' ) –

    The coordinate type to use in the tree. Currently only float32 and float64 are permitted. Defaults to None.

from_index classmethod

from_index(index: IndexLike) -> KDTreeMetadata

Create from an existing KDTree buffer.

range

range(
    index: IndexLike,
    min_x: int | float,
    min_y: int | float,
    max_x: int | float,
    max_y: int | float,
) -> Array

Search a KDTree for elements intersecting the provided bounding box.

Results are the insertion indexes of items that match the query.

Example:

import numpy as np
from geoindex_rs import kdtree as kd

builder = kd.KDTreeBuilder(3)
x = np.arange(0, 3)
y = np.arange(2, 5)
builder.add(x, y)
tree = builder.finish()

results = kd.range(tree, 2, 4, 7, 9)
assert results.to_pylist() == [2]

Parameters:

  • index (IndexLike) –

    the KDTree to search.

  • min_x (int | float) –

    The min_x coordinate of the query bounding box.

  • min_y (int | float) –

    The min_y coordinate of the query bounding box.

  • max_x (int | float) –

    The max_x coordinate of the query bounding box.

  • max_y (int | float) –

    The max_y coordinate of the query bounding box.

Returns:

  • Array

    A uint32-typed Arrow array with the insertion indexes of query results.

within

within(
    index: IndexLike, qx: int | float, qy: int | float, r: int | float
) -> Array

Search a KDTree for elements within the given distance of the query point.

Results are the insertion indexes of items that match the query.

Parameters:

  • index (IndexLike) –

    the KDTree to search.

  • qx (int | float) –

    The x coordinate of the query point.

  • qy (int | float) –

    The y coordinate of the query point.

  • r (int | float) –

    The radius from the query point to use for searching.

Returns:

  • Array

    A uint32-typed Arrow array with the insertion indexes of query results.