RecordBatch¶

arro3.core.RecordBatch ¶

A two-dimensional batch of column-oriented data with a defined schema.

A RecordBatch is a two-dimensional dataset of a number of contiguous arrays, each the same length. A record batch has a schema which must match its arrays' datatypes.

Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental.

column_names `property` ¶

column_names: list[str]

Names of the RecordBatch columns.

columns `property` ¶

columns: list[Array]

List of all columns in numerical order.

nbytes `property` ¶

nbytes: int

Total number of bytes consumed by the elements of the record batch.

num_columns `property` ¶

num_columns: int

Number of columns.

num_rows `property` ¶

num_rows: int

Number of rows

Due to the definition of a RecordBatch, all columns have the same number of rows.

schema `property` ¶

schema: Schema

Access the schema of this RecordBatch

shape `property` ¶

shape: tuple[int, int]

Dimensions of the table or record batch: (number of rows, number of columns).

__arrow_c_array__ ¶

__arrow_c_array__(
    requested_schema: object | None = None,
) -> tuple[object, object]

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

For example, you can call pyarrow.record_batch() to convert this RecordBatch into a pyarrow RecordBatch, without copying memory.

__arrow_c_schema__ ¶

__arrow_c_schema__() -> object

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

This allows Arrow consumers to inspect the data type of this RecordBatch. Then the consumer can ask the producer (in __arrow_c_array__) to cast the exported data to a supported data type.

add_column ¶

add_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch

Add column to RecordBatch at position.

A new RecordBatch is returned with the column added, the original RecordBatch object is left unchanged.

Parameters:

i (int) –

Index to place the column at.
field (str | ArrowSchemaExportable) –

description
column (ArrayInput) –

Column data.

Returns:

RecordBatch –

New RecordBatch with the passed column added.

append_column ¶

append_column(
    field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch

Append column at end of columns.

Parameters:

field (str | ArrowSchemaExportable) –

If a string is passed then the type is deduced from the column data.
column (ArrayInput) –

Column data

Returns:

RecordBatch –

description

column ¶

column(i: int | str) -> Array

Select single column from Table or RecordBatch.

Parameters:

i (int | str) –

The index or name of the column to retrieve.

Returns:

Array –

description

equals ¶

equals(other: ArrowArrayExportable) -> bool

Check if contents of two record batches are equal.

Parameters:

other (ArrowArrayExportable) –

RecordBatch to compare against.

Returns:

bool –

description

field ¶

field(i: int | str) -> Field

Select a schema field by its column name or numeric index.

Parameters:

i (int | str) –

The index or name of the field to retrieve.

Returns:

Field –

description

from_arrays `classmethod` ¶

from_arrays(
    arrays: Sequence[ArrayInput], *, schema: ArrowSchemaExportable
) -> RecordBatch

Construct a RecordBatch from multiple Arrays

Parameters:

arrays (Sequence[ArrayInput]) –

One for each field in RecordBatch
schema (ArrowSchemaExportable) –

Schema for the created batch. If not passed, names must be passed

Returns:

RecordBatch –

description

from_arrow `classmethod` ¶

from_arrow(input: ArrowArrayExportable | ArrowStreamExportable) -> RecordBatch

Construct this from an existing Arrow RecordBatch.

It can be called on anything that exports the Arrow data interface (has a __arrow_c_array__ method) and returns a StructArray..

Parameters:

input (ArrowArrayExportable | ArrowStreamExportable) –

Arrow array to use for constructing this object

Returns:

RecordBatch –

new RecordBatch

from_arrow_pycapsule `classmethod` ¶

from_arrow_pycapsule(schema_capsule, array_capsule) -> RecordBatch

Construct this object from bare Arrow PyCapsules

from_pydict `classmethod` ¶

from_pydict(
    mapping: dict[str, ArrayInput],
    *,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> RecordBatch

Construct a Table or RecordBatch from Arrow arrays or columns.

Parameters:

mapping (dict[str, ArrayInput]) –

A mapping of strings to Arrays.
metadata (dict[str, str] | dict[bytes, bytes] | None, default: None ) –

Optional metadata for the schema (if inferred). Defaults to None.

Returns:

RecordBatch –

description

from_struct_array `classmethod` ¶

from_struct_array(struct_array: ArrowArrayExportable) -> RecordBatch

Construct a RecordBatch from a StructArray.

Each field in the StructArray will become a column in the resulting RecordBatch.

Parameters:

struct_array (ArrowArrayExportable) –

Array to construct the record batch from.

Returns:

RecordBatch –

New RecordBatch

remove_column ¶

remove_column(i: int) -> RecordBatch

Create new RecordBatch with the indicated column removed.

Parameters:

i (int) –

Index of column to remove.

Returns:

RecordBatch –

New record batch without the column.

select ¶

select(columns: list[int] | list[str]) -> RecordBatch

Select columns of the RecordBatch.

Returns a new RecordBatch with the specified columns, and metadata preserved.

Parameters:

columns (list[int] | list[str]) –

The column names or integer indices to select.

Returns:

RecordBatch –

New RecordBatch.

set_column ¶

set_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch

Replace column in RecordBatch at position.

Parameters:

i (int) –

Index to place the column at.
field (str | ArrowSchemaExportable) –

If a string is passed then the type is deduced from the column data.
column (ArrayInput) –

Column data.

Returns:

RecordBatch –

New RecordBatch.

slice ¶

slice(offset: int = 0, length: int | None = None) -> RecordBatch

Compute zero-copy slice of this RecordBatch

Parameters:

offset (int, default: 0 ) –

Offset from start of record batch to slice. Defaults to 0.
length (int | None, default: None ) –

Length of slice (default is until end of batch starting from offset). Defaults to None.

Returns:

RecordBatch –

New RecordBatch.

take ¶

take(indices: ArrayInput) -> RecordBatch

Select rows from a Table or RecordBatch.

Parameters:

indices (ArrayInput) –

The indices in the tabular object whose rows will be returned.

Returns:

RecordBatch –

description

to_struct_array ¶

to_struct_array() -> Array

Convert to a struct array.

Returns:

Array –

description

with_schema ¶

with_schema(schema: ArrowSchemaExportable) -> RecordBatch

Return a RecordBatch with the provided schema.

RecordBatch¶

arro3.core.RecordBatch ¶

column_names property ¶

columns property ¶

nbytes property ¶

num_columns property ¶

num_rows property ¶

schema property ¶

shape property ¶

__arrow_c_array__ ¶

__arrow_c_schema__ ¶

add_column ¶

append_column ¶

column ¶

equals ¶

field ¶

from_arrays classmethod ¶

from_arrow classmethod ¶

from_arrow_pycapsule classmethod ¶

from_pydict classmethod ¶

from_struct_array classmethod ¶

remove_column ¶

select ¶

set_column ¶

slice ¶

take ¶

to_struct_array ¶

with_schema ¶

column_names `property` ¶

columns `property` ¶

nbytes `property` ¶

num_columns `property` ¶

num_rows `property` ¶

schema `property` ¶

shape `property` ¶

from_arrays `classmethod` ¶

from_arrow `classmethod` ¶

from_arrow_pycapsule `classmethod` ¶

from_pydict `classmethod` ¶

from_struct_array `classmethod` ¶