Skip to content

RecordBatch

arro3.core.RecordBatch

A two-dimensional batch of column-oriented data with a defined schema.

A RecordBatch is a two-dimensional dataset of a number of contiguous arrays, each the same length. A record batch has a schema which must match its arrays' datatypes.

Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental.

column_names property

column_names: list[str]

Names of the RecordBatch columns.

columns property

columns: list[Array]

List of all columns in numerical order.

nbytes property

nbytes: int

Total number of bytes consumed by the elements of the record batch.

num_columns property

num_columns: int

Number of columns.

num_rows property

num_rows: int

Number of rows

Due to the definition of a RecordBatch, all columns have the same number of rows.

schema property

schema: Schema

Access the schema of this RecordBatch

shape property

shape: tuple[int, int]

Dimensions of the table or record batch: (number of rows, number of columns).

__arrow_c_array__

__arrow_c_array__(
    requested_schema: object | None = None,
) -> tuple[object, object]

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

For example, you can call pyarrow.record_batch() to convert this RecordBatch into a pyarrow RecordBatch, without copying memory.

append_column

append_column(
    field: str | ArrowSchemaExportable, column: ArrowArrayExportable
) -> RecordBatch

Append column at end of columns.

Parameters:

Returns:

column

column(i: int | str) -> ChunkedArray

Select single column from Table or RecordBatch.

Parameters:

  • i (int | str) –

    The index or name of the column to retrieve.

Returns:

equals

equals(other: ArrowArrayExportable) -> bool

Check if contents of two record batches are equal.

Parameters:

Returns:

  • bool

    description

field

field(i: int | str) -> Field

Select a schema field by its column name or numeric index.

Parameters:

  • i (int | str) –

    The index or name of the field to retrieve.

Returns:

from_arrays classmethod

from_arrays(
    arrays: Sequence[ArrowArrayExportable], *, schema: ArrowSchemaExportable
) -> RecordBatch

Construct a RecordBatch from multiple Arrays

Parameters:

Returns:

from_arrow classmethod

Construct this from an existing Arrow RecordBatch.

It can be called on anything that exports the Arrow data interface (has a __arrow_c_array__ method) and returns a StructArray..

Parameters:

Returns:

from_arrow_pycapsule classmethod

from_arrow_pycapsule(schema_capsule, array_capsule) -> RecordBatch

Construct this object from bare Arrow PyCapsules

from_pydict classmethod

from_pydict(
    mapping: dict[str, ArrowArrayExportable],
    *,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> RecordBatch

Construct a Table or RecordBatch from Arrow arrays or columns.

Parameters:

Returns:

from_struct_array classmethod

from_struct_array(struct_array: ArrowArrayExportable) -> RecordBatch

Construct a RecordBatch from a StructArray.

Each field in the StructArray will become a column in the resulting RecordBatch.

Parameters:

Returns:

remove_column

remove_column(i: int) -> RecordBatch

Create new RecordBatch with the indicated column removed.

Parameters:

  • i (int) –

    Index of column to remove.

Returns:

select

select(columns: list[int] | list[str]) -> RecordBatch

Select columns of the RecordBatch.

Returns a new RecordBatch with the specified columns, and metadata preserved.

Parameters:

  • columns (list[int] | list[str]) –

    The column names or integer indices to select.

Returns:

set_column

set_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrowArrayExportable
) -> RecordBatch

Replace column in RecordBatch at position.

Parameters:

Returns:

slice

slice(offset: int = 0, length: int | None = None) -> RecordBatch

Compute zero-copy slice of this RecordBatch

Parameters:

  • offset (int, default: 0 ) –

    Offset from start of record batch to slice. Defaults to 0.

  • length (int | None, default: None ) –

    Length of slice (default is until end of batch starting from offset). Defaults to None.

Returns:

take

Select rows from a Table or RecordBatch.

Parameters:

Returns:

to_struct_array

to_struct_array() -> Array

Convert to a struct array.

Returns:

with_schema

with_schema(schema: ArrowSchemaExportable) -> RecordBatch

Return a RecordBatch with the provided schema.