RecordBatch¶

arro3.core.RecordBatch ¶

A two-dimensional batch of column-oriented data with a defined schema.

A RecordBatch is a two-dimensional dataset of a number of contiguous arrays, each the same length. A record batch has a schema which must match its arrays' datatypes.

Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental.

column_names `property` ¶

column_names: list[str]

Names of the RecordBatch columns.

columns `property` ¶

columns: list[Array]

List of all columns in numerical order.

nbytes `property` ¶

nbytes: int

Total number of bytes consumed by the elements of the record batch.

num_columns `property` ¶

num_columns: int

Number of columns.

num_rows `property` ¶

num_rows: int

Number of rows

Due to the definition of a RecordBatch, all columns have the same number of rows.

schema `property` ¶

schema: Schema

Access the schema of this RecordBatch

shape `property` ¶

shape: tuple[int, int]

Dimensions of the table or record batch: (number of rows, number of columns).

__arrow_c_array__ ¶

__arrow_c_array__(
    requested_schema: object | None = None,
) -> tuple[object, object]

An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.

For example, you can call pyarrow.record_batch() to convert this RecordBatch into a pyarrow RecordBatch, without copying memory.

append_column ¶

append_column(
    field: str | ArrowSchemaExportable, column: ArrowArrayExportable
) -> RecordBatch

Append column at end of columns.

Parameters:

field (str | ArrowSchemaExportable) –

If a string is passed then the type is deduced from the column data.
column (ArrowArrayExportable) –

Column data

Returns:

RecordBatch –

description

column ¶

column(i: int | str) -> ChunkedArray

Select single column from Table or RecordBatch.

Parameters:

i (int | str) –

The index or name of the column to retrieve.

Returns:

ChunkedArray –

description

equals ¶

equals(other: ArrowArrayExportable) -> bool

Check if contents of two record batches are equal.

Parameters:

other (ArrowArrayExportable) –

RecordBatch to compare against.

Returns:

bool –

description

field ¶

field(i: int | str) -> Field

Select a schema field by its column name or numeric index.

Parameters:

i (int | str) –

The index or name of the field to retrieve.

Returns:

Field –

description

from_arrays `classmethod` ¶

from_arrays(
    arrays: Sequence[ArrowArrayExportable], *, schema: ArrowSchemaExportable
) -> RecordBatch

Construct a RecordBatch from multiple Arrays

Parameters:

arrays (Sequence[ArrowArrayExportable]) –

One for each field in RecordBatch
schema (ArrowSchemaExportable) –

Schema for the created batch. If not passed, names must be passed

Returns:

RecordBatch –

description

from_arrow `classmethod` ¶

from_arrow(input: ArrowArrayExportable | ArrowStreamExportable) -> RecordBatch

Construct this from an existing Arrow RecordBatch.

It can be called on anything that exports the Arrow data interface (has a __arrow_c_array__ method) and returns a StructArray..

Parameters:

input (ArrowArrayExportable | ArrowStreamExportable) –

Arrow array to use for constructing this object

Returns:

RecordBatch –

new RecordBatch

from_arrow_pycapsule `classmethod` ¶

from_arrow_pycapsule(schema_capsule, array_capsule) -> RecordBatch

Construct this object from bare Arrow PyCapsules

from_pydict `classmethod` ¶

from_pydict(
    mapping: dict[str, ArrowArrayExportable],
    *,
    metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> RecordBatch

Construct a Table or RecordBatch from Arrow arrays or columns.

Parameters:

mapping (dict[str, ArrowArrayExportable]) –

A mapping of strings to Arrays.
metadata (dict[str, str] | dict[bytes, bytes] | None, default: None ) –

Optional metadata for the schema (if inferred). Defaults to None.

Returns:

RecordBatch –

description

from_struct_array `classmethod` ¶

from_struct_array(struct_array: ArrowArrayExportable) -> RecordBatch

Construct a RecordBatch from a StructArray.

Each field in the StructArray will become a column in the resulting RecordBatch.

Parameters:

struct_array (ArrowArrayExportable) –

Array to construct the record batch from.

Returns:

RecordBatch –

New RecordBatch

remove_column ¶

remove_column(i: int) -> RecordBatch

Create new RecordBatch with the indicated column removed.

Parameters:

i (int) –

Index of column to remove.

Returns:

RecordBatch –

New record batch without the column.

select ¶

select(columns: list[int] | list[str]) -> RecordBatch

Select columns of the RecordBatch.

Returns a new RecordBatch with the specified columns, and metadata preserved.

Parameters:

columns (list[int] | list[str]) –

The column names or integer indices to select.

Returns:

RecordBatch –

New RecordBatch.

set_column ¶

set_column(
    i: int, field: str | ArrowSchemaExportable, column: ArrowArrayExportable
) -> RecordBatch

Replace column in RecordBatch at position.

Parameters:

i (int) –

Index to place the column at.
field (str | ArrowSchemaExportable) –

If a string is passed then the type is deduced from the column data.
column (ArrowArrayExportable) –

Column data.

Returns:

RecordBatch –

New RecordBatch.

slice ¶

slice(offset: int = 0, length: int | None = None) -> RecordBatch

Compute zero-copy slice of this RecordBatch

Parameters:

offset (int, default: 0 ) –

Offset from start of record batch to slice. Defaults to 0.
length (int | None, default: None ) –

Length of slice (default is until end of batch starting from offset). Defaults to None.

Returns:

RecordBatch –

New RecordBatch.

take ¶

take(indices: ArrowArrayExportable) -> RecordBatch

Select rows from a Table or RecordBatch.

Parameters:

indices (ArrowArrayExportable) –

The indices in the tabular object whose rows will be returned.

Returns:

RecordBatch –

description

to_struct_array ¶

to_struct_array() -> Array

Convert to a struct array.

Returns:

Array –

description

with_schema ¶

with_schema(schema: ArrowSchemaExportable) -> RecordBatch

Return a RecordBatch with the provided schema.

RecordBatch¶

arro3.core.RecordBatch ¶

column_names property ¶

columns property ¶

nbytes property ¶

num_columns property ¶

num_rows property ¶

schema property ¶

shape property ¶

__arrow_c_array__ ¶

append_column ¶

column ¶

equals ¶

field ¶

from_arrays classmethod ¶

from_arrow classmethod ¶

from_arrow_pycapsule classmethod ¶

from_pydict classmethod ¶

from_struct_array classmethod ¶

remove_column ¶

select ¶

set_column ¶

slice ¶

take ¶

to_struct_array ¶

with_schema ¶

column_names `property` ¶

columns `property` ¶

nbytes `property` ¶

num_columns `property` ¶

num_rows `property` ¶

schema `property` ¶

shape `property` ¶

from_arrays `classmethod` ¶

from_arrow `classmethod` ¶

from_arrow_pycapsule `classmethod` ¶

from_pydict `classmethod` ¶

from_struct_array `classmethod` ¶