RecordBatch¶
arro3.core.RecordBatch ¶
A two-dimensional batch of column-oriented data with a defined schema.
A RecordBatch
is a two-dimensional dataset of a number of contiguous arrays, each
the same length. A record batch has a schema which must match its arrays' datatypes.
Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental.
num_rows
property
¶
num_rows: int
Number of rows
Due to the definition of a RecordBatch, all columns have the same number of rows.
shape
property
¶
Dimensions of the table or record batch: (number of rows, number of columns).
__arrow_c_array__ ¶
An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.
For example, you can call pyarrow.record_batch()
to
convert this RecordBatch into a pyarrow RecordBatch, without copying memory.
__arrow_c_schema__ ¶
__arrow_c_schema__() -> object
An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.
This allows Arrow consumers to inspect the data type of this RecordBatch. Then
the consumer can ask the producer (in __arrow_c_array__
) to cast the exported
data to a supported data type.
add_column ¶
add_column(
i: int, field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch
Add column to RecordBatch at position.
A new RecordBatch is returned with the column added, the original RecordBatch object is left unchanged.
Parameters:
-
i
(int
) –Index to place the column at.
-
field
(str | ArrowSchemaExportable
) –description
-
column
(ArrayInput
) –Column data.
Returns:
-
RecordBatch
–New RecordBatch with the passed column added.
append_column ¶
append_column(
field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch
Append column at end of columns.
Parameters:
-
field
(str | ArrowSchemaExportable
) –If a string is passed then the type is deduced from the column data.
-
column
(ArrayInput
) –Column data
Returns:
-
RecordBatch
–description
column ¶
equals ¶
equals(other: ArrowArrayExportable) -> bool
Check if contents of two record batches are equal.
Parameters:
-
other
(ArrowArrayExportable
) –RecordBatch to compare against.
Returns:
-
bool
–description
field ¶
from_arrays
classmethod
¶
from_arrays(
arrays: Sequence[ArrayInput], *, schema: ArrowSchemaExportable
) -> RecordBatch
Construct a RecordBatch from multiple Arrays
Parameters:
-
arrays
(Sequence[ArrayInput]
) –One for each field in RecordBatch
-
schema
(ArrowSchemaExportable
) –Schema for the created batch. If not passed, names must be passed
Returns:
-
RecordBatch
–description
from_arrow
classmethod
¶
from_arrow(input: ArrowArrayExportable | ArrowStreamExportable) -> RecordBatch
Construct this from an existing Arrow RecordBatch.
It can be called on anything that exports the Arrow data interface
(has a __arrow_c_array__
method) and returns a StructArray..
Parameters:
-
input
(ArrowArrayExportable | ArrowStreamExportable
) –Arrow array to use for constructing this object
Returns:
-
RecordBatch
–new RecordBatch
from_arrow_pycapsule
classmethod
¶
from_arrow_pycapsule(schema_capsule, array_capsule) -> RecordBatch
Construct this object from bare Arrow PyCapsules
from_pydict
classmethod
¶
from_pydict(
mapping: dict[str, ArrayInput],
*,
metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> RecordBatch
Construct a Table or RecordBatch from Arrow arrays or columns.
Parameters:
-
mapping
(dict[str, ArrayInput]
) –A mapping of strings to Arrays.
-
metadata
(dict[str, str] | dict[bytes, bytes] | None
, default:None
) –Optional metadata for the schema (if inferred). Defaults to None.
Returns:
-
RecordBatch
–description
from_struct_array
classmethod
¶
from_struct_array(struct_array: ArrowArrayExportable) -> RecordBatch
Construct a RecordBatch from a StructArray.
Each field in the StructArray will become a column in the resulting RecordBatch.
Parameters:
-
struct_array
(ArrowArrayExportable
) –Array to construct the record batch from.
Returns:
-
RecordBatch
–New RecordBatch
remove_column ¶
remove_column(i: int) -> RecordBatch
Create new RecordBatch with the indicated column removed.
Parameters:
-
i
(int
) –Index of column to remove.
Returns:
-
RecordBatch
–New record batch without the column.
select ¶
select(columns: list[int] | list[str]) -> RecordBatch
Select columns of the RecordBatch.
Returns a new RecordBatch with the specified columns, and metadata preserved.
Parameters:
Returns:
-
RecordBatch
–New RecordBatch.
set_column ¶
set_column(
i: int, field: str | ArrowSchemaExportable, column: ArrayInput
) -> RecordBatch
Replace column in RecordBatch at position.
Parameters:
-
i
(int
) –Index to place the column at.
-
field
(str | ArrowSchemaExportable
) –If a string is passed then the type is deduced from the column data.
-
column
(ArrayInput
) –Column data.
Returns:
-
RecordBatch
–New RecordBatch.
slice ¶
slice(offset: int = 0, length: int | None = None) -> RecordBatch
Compute zero-copy slice of this RecordBatch
Parameters:
-
offset
(int
, default:0
) –Offset from start of record batch to slice. Defaults to 0.
-
length
(int | None
, default:None
) –Length of slice (default is until end of batch starting from offset). Defaults to None.
Returns:
-
RecordBatch
–New RecordBatch.
take ¶
take(indices: ArrayInput) -> RecordBatch
Select rows from a Table or RecordBatch.
Parameters:
-
indices
(ArrayInput
) –The indices in the tabular object whose rows will be returned.
Returns:
-
RecordBatch
–description
with_schema ¶
with_schema(schema: ArrowSchemaExportable) -> RecordBatch
Return a RecordBatch with the provided schema.