Skip to content

Table

arro3.core.Table

A collection of top-level named, equal length Arrow arrays.

chunk_lengths property

chunk_lengths: list[int]

The number of rows in each internal chunk.

column_names property

column_names: list[str]

Names of the Table or RecordBatch columns.

Returns:

columns property

columns: list[ChunkedArray]

List of all columns in numerical order.

Returns:

nbytes property

nbytes: int

Total number of bytes consumed by the elements of the table.

num_columns property

num_columns: int

Number of columns in this table.

num_rows property

num_rows: int

Number of rows in this table.

Due to the definition of a table, all columns have the same number of rows.

schema property

schema: Schema

Schema of the table and its columns.

Returns:

shape property

shape: tuple[int, int]

Dimensions of the table or record batch

Returns:

  • tuple[int, int]

    (number of rows, number of columns)

__arrow_c_schema__

__arrow_c_schema__() -> object

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

This allows Arrow consumers to inspect the data type of this Table. Then the consumer can ask the producer (in __arrow_c_stream__) to cast the exported data to a supported data type.

__arrow_c_stream__

__arrow_c_stream__(requested_schema: object | None = None) -> object

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

For example, you can call pyarrow.table() to convert this array into a pyarrow table, without copying memory.

add_column

add_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrowStreamExportable
) -> Table

Add column to Table at position.

A new table is returned with the column added, the original table object is left unchanged.

Parameters:

Returns:

  • Table

    New table with the passed column added.

append_column

append_column(
    field: str | ArrowSchemaExportable, column: ArrowStreamExportable
) -> Table

Append column at end of columns.

Parameters:

Returns:

  • Table

    New table or record batch with the passed column added.

column

column(i: int | str) -> ChunkedArray

Select single column from Table or RecordBatch.

Parameters:

  • i (int | str) –

    The index or name of the column to retrieve.

Returns:

combine_chunks

combine_chunks() -> Table

Make a new table by combining the chunks this table has.

All the underlying chunks in the ChunkedArray of each column are concatenated into zero or one chunk.

Returns:

  • Table

    new Table with one or zero chunks.

field

field(i: int | str) -> Field

Select a schema field by its column name or numeric index.

Parameters:

  • i (int | str) –

    The index or name of the field to retrieve.

Returns:

from_arrays classmethod

from_arrays(
    arrays: Sequence[ArrowArrayExportable | ArrowStreamExportable],
    *,
    names: Sequence[str] | None = None,
    schema: ArrowSchemaExportable | None = None,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> Table

Construct a Table from Arrow arrays.

Parameters:

  • arrays (Sequence[ArrowArrayExportable | ArrowStreamExportable]) –

    Equal-length arrays that should form the table.

  • names (Sequence[str] | None, default: None ) –

    Names for the table columns. If not passed, schema must be passed. Defaults to None.

  • schema (ArrowSchemaExportable | None, default: None ) –

    Schema for the created table. If not passed, names must be passed. Defaults to None.

  • metadata (dict[str, str] | dict[bytes, bytes] | None, default: None ) –

    Optional metadata for the schema (if inferred). Defaults to None.

Returns:

from_arrow classmethod

Construct this object from an existing Arrow object.

It can be called on anything that exports the Arrow stream interface (__arrow_c_stream__) and yields a StructArray for each item. This Table will materialize all items from the iterator in memory at once. Use [RecordBatchReader] if you don't wish to materialize all batches in memory at once.

Parameters:

Returns:

from_arrow_pycapsule classmethod

from_arrow_pycapsule(capsule) -> Table

Construct this object from a bare Arrow PyCapsule

Parameters:

  • capsule

    description

Returns:

from_batches classmethod

from_batches(
    batches: Sequence[ArrowArrayExportable],
    *,
    schema: ArrowSchemaExportable | None = None
) -> Table

Construct a Table from a sequence of Arrow RecordBatches.

Parameters:

  • batches (Sequence[ArrowArrayExportable]) –

    Sequence of RecordBatch to be converted, all schemas must be equal.

  • schema (ArrowSchemaExportable | None, default: None ) –

    If not passed, will be inferred from the first RecordBatch. Defaults to None.

Returns:

from_pydict classmethod

from_pydict(
    mapping: dict[str, ArrowArrayExportable | ArrowStreamExportable],
    *,
    schema: ArrowSchemaExportable | None = None,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> Table

Construct a Table or RecordBatch from Arrow arrays or columns.

Parameters:

Returns:

rechunk

rechunk(*, max_chunksize: int | None = None) -> Table

Rechunk a table with a maximum number of rows per chunk.

Parameters:

  • max_chunksize (int | None, default: None ) –

    The maximum number of rows per internal RecordBatch. Defaults to None, which rechunks into a single batch.

Returns:

  • Table

    The rechunked table.

remove_column

remove_column(i: int) -> Table

Create new Table with the indicated column removed.

Parameters:

  • i (int) –

    Index of column to remove.

Returns:

  • Table

    New table without the column.

rename_columns

rename_columns(names: Sequence[str]) -> Table

Create new table with columns renamed to provided names.

Parameters:

Returns:

select

select(columns: Sequence[int] | Sequence[str]) -> Table

Select columns of the Table.

Returns a new Table with the specified columns, and metadata preserved.

Parameters:

Returns:

set_column

set_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrowStreamExportable
) -> Table

Replace column in Table at position.

Parameters:

Returns:

slice

slice(offset: int = 0, length: int | None = None) -> Table

Compute zero-copy slice of this table.

Parameters:

  • offset (int, default: 0 ) –

    Defaults to 0.

  • length (int | None, default: None ) –

    Defaults to None.

Returns:

  • Table

    The sliced table

to_batches

to_batches() -> list[RecordBatch]

Convert Table to a list of RecordBatch objects.

Note that this method is zero-copy, it merely exposes the same data under a different API.

Returns:

to_reader

to_reader() -> RecordBatchReader

Convert the Table to a RecordBatchReader.

Note that this method is zero-copy, it merely exposes the same data under a different API.

Returns:

to_struct_array

to_struct_array() -> ChunkedArray

Convert to a chunked array of struct type.

Returns:

with_schema

with_schema(schema: ArrowSchemaExportable) -> Table

Assign a different schema onto this table.

The new schema must be compatible with the existing data; this does not cast the underlying data to the new schema. This is primarily useful for changing the schema metadata.

Parameters:

Returns: