Table¶

arro3.core.Table ¶

A collection of top-level named, equal length Arrow arrays.

chunk_lengths `property` ¶

chunk_lengths: list[int]

The number of rows in each internal chunk.

column_names `property` ¶

column_names: list[str]

Names of the Table or RecordBatch columns.

Returns:

list[str] –

description

columns `property` ¶

columns: list[ChunkedArray]

List of all columns in numerical order.

Returns:

list[ChunkedArray] –

description

nbytes `property` ¶

nbytes: int

Total number of bytes consumed by the elements of the table.

num_columns `property` ¶

num_columns: int

Number of columns in this table.

num_rows `property` ¶

num_rows: int

Number of rows in this table.

Due to the definition of a table, all columns have the same number of rows.

schema `property` ¶

schema: Schema

Schema of the table and its columns.

Returns:

Schema –

description

shape `property` ¶

shape: tuple[int, int]

Dimensions of the table or record batch

Returns:

tuple[int, int] –

(number of rows, number of columns)

__arrow_c_schema__ ¶

__arrow_c_schema__() -> object

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

This allows Arrow consumers to inspect the data type of this Table. Then the consumer can ask the producer (in __arrow_c_stream__) to cast the exported data to a supported data type.

__arrow_c_stream__ ¶

__arrow_c_stream__(requested_schema: object | None = None) -> object

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

For example, you can call pyarrow.table() to convert this array into a pyarrow table, without copying memory.

add_column ¶

add_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrowStreamExportable
) -> Table

Add column to Table at position.

A new table is returned with the column added, the original table object is left unchanged.

Parameters:

i (int) –

Index to place the column at.
field (str | ArrowSchemaExportable) –

description
column (ArrowStreamExportable) –

Column data.

Returns:

Table –

New table with the passed column added.

append_column ¶

append_column(
    field: str | ArrowSchemaExportable, column: ArrowStreamExportable
) -> Table

Append column at end of columns.

Parameters:

field (str | ArrowSchemaExportable) –

description
column (ArrowStreamExportable) –

Column data.

Returns:

Table –

New table or record batch with the passed column added.

column ¶

column(i: int | str) -> ChunkedArray

Select single column from Table or RecordBatch.

Parameters:

i (int | str) –

The index or name of the column to retrieve.

Returns:

ChunkedArray –

description

combine_chunks ¶

combine_chunks() -> Table

Make a new table by combining the chunks this table has.

All the underlying chunks in the ChunkedArray of each column are concatenated into zero or one chunk.

Returns:

Table –

new Table with one or zero chunks.

field ¶

field(i: int | str) -> Field

Select a schema field by its column name or numeric index.

Parameters:

i (int | str) –

The index or name of the field to retrieve.

Returns:

Field –

description

from_arrays `classmethod` ¶

from_arrays(
    arrays: Sequence[ArrayInput | ArrowStreamExportable],
    *,
    names: Sequence[str],
    schema: None = None,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> Table

from_arrays(
    arrays: Sequence[ArrayInput | ArrowStreamExportable],
    *,
    names: None = None,
    schema: ArrowSchemaExportable,
    metadata: None = None
) -> Table

from_arrays(
    arrays: Sequence[ArrayInput | ArrowStreamExportable],
    *,
    names: Sequence[str] | None = None,
    schema: ArrowSchemaExportable | None = None,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> Table

Construct a Table from Arrow arrays.

Parameters:

arrays (Sequence[ArrayInput | ArrowStreamExportable]) –

Equal-length arrays that should form the table.
names (Sequence[str] | None, default: None ) –

Names for the table columns. If not passed, schema must be passed. Defaults to None.
schema (ArrowSchemaExportable | None, default: None ) –

Schema for the created table. If not passed, names must be passed. Defaults to None.
metadata (dict[str, str] | dict[bytes, bytes] | None, default: None ) –

Optional metadata for the schema (if inferred). Defaults to None.

Returns:

Table –

new table

from_arrow `classmethod` ¶

from_arrow(input: ArrowArrayExportable | ArrowStreamExportable) -> Table

Construct this object from an existing Arrow object.

It can be called on anything that exports the Arrow stream interface (__arrow_c_stream__) and yields a StructArray for each item. This Table will materialize all items from the iterator in memory at once. Use [RecordBatchReader] if you don't wish to materialize all batches in memory at once.

Parameters:

input (ArrowArrayExportable | ArrowStreamExportable) –

Arrow stream to use for constructing this object

Returns:

Table –

Self

from_arrow_pycapsule `classmethod` ¶

from_arrow_pycapsule(capsule) -> Table

Construct this object from a bare Arrow PyCapsule

Parameters:

capsule –

description

Returns:

Table –

description

from_batches `classmethod` ¶

from_batches(
    batches: Sequence[ArrowArrayExportable],
    *,
    schema: ArrowSchemaExportable | None = None
) -> Table

Construct a Table from a sequence of Arrow RecordBatches.

Parameters:

batches (Sequence[ArrowArrayExportable]) –

Sequence of RecordBatch to be converted, all schemas must be equal.
schema (ArrowSchemaExportable | None, default: None ) –

If not passed, will be inferred from the first RecordBatch. Defaults to None.

Returns:

Table –

New Table.

from_pydict `classmethod` ¶

from_pydict(
    mapping: dict[str, ArrayInput | ArrowStreamExportable],
    *,
    schema: None = None,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> Table

from_pydict(
    mapping: dict[str, ArrayInput | ArrowStreamExportable],
    *,
    schema: ArrowSchemaExportable,
    metadata: None = None
) -> Table

from_pydict(
    mapping: dict[str, ArrayInput | ArrowStreamExportable],
    *,
    schema: ArrowSchemaExportable | None = None,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> Table

Construct a Table or RecordBatch from Arrow arrays or columns.

Parameters:

mapping (dict[str, ArrayInput | ArrowStreamExportable]) –

A mapping of strings to Arrays.
schema (ArrowSchemaExportable | None, default: None ) –

If not passed, will be inferred from the Mapping values. Defaults to None.
metadata (dict[str, str] | dict[bytes, bytes] | None, default: None ) –

Optional metadata for the schema (if inferred). Defaults to None.

Returns:

Table –

new table

rechunk ¶

rechunk(*, max_chunksize: int | None = None) -> Table

Rechunk a table with a maximum number of rows per chunk.

Parameters:

max_chunksize (int | None, default: None ) –

The maximum number of rows per internal RecordBatch. Defaults to None, which rechunks into a single batch.

Returns:

Table –

The rechunked table.

remove_column ¶

remove_column(i: int) -> Table

Create new Table with the indicated column removed.

Parameters:

i (int) –

Index of column to remove.

Returns:

Table –

New table without the column.

rename_columns ¶

rename_columns(names: Sequence[str]) -> Table

Create new table with columns renamed to provided names.

Parameters:

names (Sequence[str]) –

List of new column names.

Returns:

Table –

description

select ¶

select(columns: Sequence[int] | Sequence[str]) -> Table

Select columns of the Table.

Returns a new Table with the specified columns, and metadata preserved.

Parameters:

columns (Sequence[int] | Sequence[str]) –

The column names or integer indices to select.

Returns:

Table –

description

set_column ¶

set_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrowStreamExportable
) -> Table

Replace column in Table at position.

Parameters:

i (int) –

Index to place the column at.
field (str | ArrowSchemaExportable) –

description
column (ArrowStreamExportable) –

Column data.

Returns:

Table –

description

slice ¶

slice(offset: int = 0, length: int | None = None) -> Table

Compute zero-copy slice of this table.

Parameters:

offset (int, default: 0 ) –

Defaults to 0.
length (int | None, default: None ) –

Defaults to None.

Returns:

Table –

The sliced table

to_batches ¶

to_batches() -> list[RecordBatch]

Convert Table to a list of RecordBatch objects.

Note that this method is zero-copy, it merely exposes the same data under a different API.

Returns:

list[RecordBatch] –

description

to_reader ¶

to_reader() -> RecordBatchReader

Convert the Table to a RecordBatchReader.

Note that this method is zero-copy, it merely exposes the same data under a different API.

Returns:

RecordBatchReader –

description

to_struct_array ¶

to_struct_array() -> ChunkedArray

Convert to a chunked array of struct type.

Returns:

ChunkedArray –

description

with_schema ¶

with_schema(schema: ArrowSchemaExportable) -> Table

Assign a different schema onto this table.

The new schema must be compatible with the existing data; this does not cast the underlying data to the new schema. This is primarily useful for changing the schema metadata.

Parameters:

schema (ArrowSchemaExportable) –

description

Returns:

Table –

description

Table¶

arro3.core.Table ¶

chunk_lengths property ¶

column_names property ¶

columns property ¶

nbytes property ¶

num_columns property ¶

num_rows property ¶

schema property ¶

shape property ¶

__arrow_c_schema__ ¶

__arrow_c_stream__ ¶

add_column ¶

append_column ¶

column ¶

combine_chunks ¶

field ¶

from_arrays classmethod ¶

from_arrow classmethod ¶

from_arrow_pycapsule classmethod ¶

from_batches classmethod ¶

from_pydict classmethod ¶

rechunk ¶

remove_column ¶

rename_columns ¶

select ¶

set_column ¶

slice ¶

to_batches ¶

to_reader ¶

to_struct_array ¶

with_schema ¶

chunk_lengths `property` ¶

column_names `property` ¶

columns `property` ¶

nbytes `property` ¶

num_columns `property` ¶

num_rows `property` ¶

schema `property` ¶

shape `property` ¶

from_arrays `classmethod` ¶

from_arrow `classmethod` ¶

from_arrow_pycapsule `classmethod` ¶

from_batches `classmethod` ¶

from_pydict `classmethod` ¶