Skip to content

RecordBatch

arro3.core.RecordBatch

A two-dimensional batch of column-oriented data with a defined schema.

A RecordBatch is a two-dimensional dataset of a number of contiguous arrays, each the same length. A record batch has a schema which must match its arrays' datatypes.

Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental.

column_names property

column_names: list[str]

Names of the RecordBatch columns.

columns property

columns: list[Array]

List of all columns in numerical order.

nbytes property

nbytes: int

Total number of bytes consumed by the elements of the record batch.

num_columns property

num_columns: int

Number of columns.

num_rows property

num_rows: int

Number of rows

Due to the definition of a RecordBatch, all columns have the same number of rows.

schema property

schema: Schema

Access the schema of this RecordBatch

shape property

shape: tuple[int, int]

Dimensions of the table or record batch: (number of rows, number of columns).

__arrow_c_array__

__arrow_c_array__(
    requested_schema: object | None = None,
) -> tuple[object, object]

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

For example, you can call pyarrow.record_batch() to convert this RecordBatch into a pyarrow RecordBatch, without copying memory.

__arrow_c_schema__

__arrow_c_schema__() -> object

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

This allows Arrow consumers to inspect the data type of this RecordBatch. Then the consumer can ask the producer (in __arrow_c_array__) to cast the exported data to a supported data type.

add_column

add_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch

Add column to RecordBatch at position.

A new RecordBatch is returned with the column added, the original RecordBatch object is left unchanged.

Parameters:

Returns:

  • RecordBatch

    New RecordBatch with the passed column added.

append_column

append_column(
    field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch

Append column at end of columns.

Parameters:

Returns:

column

column(i: int | str) -> Array

Select single column from Table or RecordBatch.

Parameters:

  • i (int | str) –

    The index or name of the column to retrieve.

Returns:

equals

equals(other: ArrowArrayExportable) -> bool

Check if contents of two record batches are equal.

Parameters:

Returns:

  • bool

    description

field

field(i: int | str) -> Field

Select a schema field by its column name or numeric index.

Parameters:

  • i (int | str) –

    The index or name of the field to retrieve.

Returns:

from_arrays classmethod

from_arrays(
    arrays: Sequence[ArrayInput], *, schema: ArrowSchemaExportable
) -> RecordBatch

Construct a RecordBatch from multiple Arrays

Parameters:

Returns:

from_arrow classmethod

Construct this from an existing Arrow RecordBatch.

It can be called on anything that exports the Arrow data interface (has a __arrow_c_array__ method) and returns a StructArray..

Parameters:

Returns:

from_arrow_pycapsule classmethod

from_arrow_pycapsule(schema_capsule, array_capsule) -> RecordBatch

Construct this object from bare Arrow PyCapsules

from_pydict classmethod

from_pydict(
    mapping: dict[str, ArrayInput],
    *,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> RecordBatch

Construct a Table or RecordBatch from Arrow arrays or columns.

Parameters:

  • mapping (dict[str, ArrayInput]) –

    A mapping of strings to Arrays.

  • metadata (dict[str, str] | dict[bytes, bytes] | None, default: None ) –

    Optional metadata for the schema (if inferred). Defaults to None.

Returns:

from_struct_array classmethod

from_struct_array(struct_array: ArrowArrayExportable) -> RecordBatch

Construct a RecordBatch from a StructArray.

Each field in the StructArray will become a column in the resulting RecordBatch.

Parameters:

Returns:

remove_column

remove_column(i: int) -> RecordBatch

Create new RecordBatch with the indicated column removed.

Parameters:

  • i (int) –

    Index of column to remove.

Returns:

select

select(columns: list[int] | list[str]) -> RecordBatch

Select columns of the RecordBatch.

Returns a new RecordBatch with the specified columns, and metadata preserved.

Parameters:

  • columns (list[int] | list[str]) –

    The column names or integer indices to select.

Returns:

set_column

set_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch

Replace column in RecordBatch at position.

Parameters:

  • i (int) –

    Index to place the column at.

  • field (str | ArrowSchemaExportable) –

    If a string is passed then the type is deduced from the column data.

  • column (ArrayInput) –

    Column data.

Returns:

slice

slice(offset: int = 0, length: int | None = None) -> RecordBatch

Compute zero-copy slice of this RecordBatch

Parameters:

  • offset (int, default: 0 ) –

    Offset from start of record batch to slice. Defaults to 0.

  • length (int | None, default: None ) –

    Length of slice (default is until end of batch starting from offset). Defaults to None.

Returns:

take

take(indices: ArrayInput) -> RecordBatch

Select rows from a Table or RecordBatch.

Parameters:

  • indices (ArrayInput) –

    The indices in the tabular object whose rows will be returned.

Returns:

to_struct_array

to_struct_array() -> Array

Convert to a struct array.

Returns:

with_schema

with_schema(schema: ArrowSchemaExportable) -> RecordBatch

Return a RecordBatch with the provided schema.