RecordBatch¶
arro3.core.RecordBatch ¶
A two-dimensional batch of column-oriented data with a defined schema.
A RecordBatch
is a two-dimensional dataset of a number of contiguous arrays, each
the same length. A record batch has a schema which must match its arrays' datatypes.
Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental.
num_rows
property
¶
num_rows: int
Number of rows
Due to the definition of a RecordBatch, all columns have the same number of rows.
shape
property
¶
Dimensions of the table or record batch: (number of rows, number of columns).
__arrow_c_array__ ¶
An implementation of the Arrow PyCapsule Interface. This dunder method should not be called directly, but enables zero-copy data transfer to other Python libraries that understand Arrow memory.
For example, you can call pyarrow.record_batch()
to
convert this RecordBatch into a pyarrow RecordBatch, without copying memory.
append_column ¶
append_column(
field: str | ArrowSchemaExportable, column: ArrowArrayExportable
) -> RecordBatch
Append column at end of columns.
Parameters:
-
field
(str | ArrowSchemaExportable
) –If a string is passed then the type is deduced from the column data.
-
column
(ArrowArrayExportable
) –Column data
Returns:
-
RecordBatch
–description
column ¶
column(i: int | str) -> ChunkedArray
equals ¶
equals(other: ArrowArrayExportable) -> bool
Check if contents of two record batches are equal.
Parameters:
-
other
(ArrowArrayExportable
) –RecordBatch to compare against.
Returns:
-
bool
–description
field ¶
from_arrays
classmethod
¶
from_arrays(
arrays: Sequence[ArrowArrayExportable], *, schema: ArrowSchemaExportable
) -> RecordBatch
Construct a RecordBatch from multiple Arrays
Parameters:
-
arrays
(Sequence[ArrowArrayExportable]
) –One for each field in RecordBatch
-
schema
(ArrowSchemaExportable
) –Schema for the created batch. If not passed, names must be passed
Returns:
-
RecordBatch
–description
from_arrow
classmethod
¶
from_arrow(input: ArrowArrayExportable | ArrowStreamExportable) -> RecordBatch
Construct this from an existing Arrow RecordBatch.
It can be called on anything that exports the Arrow data interface
(has a __arrow_c_array__
method) and returns a StructArray..
Parameters:
-
input
(ArrowArrayExportable | ArrowStreamExportable
) –Arrow array to use for constructing this object
Returns:
-
RecordBatch
–new RecordBatch
from_arrow_pycapsule
classmethod
¶
from_arrow_pycapsule(schema_capsule, array_capsule) -> RecordBatch
Construct this object from bare Arrow PyCapsules
from_pydict
classmethod
¶
from_pydict(
mapping: dict[str, ArrowArrayExportable],
*,
metadata: dict[str, str] | dict[bytes, bytes] | None = None
) -> RecordBatch
Construct a Table or RecordBatch from Arrow arrays or columns.
Parameters:
-
mapping
(dict[str, ArrowArrayExportable]
) –A mapping of strings to Arrays.
-
metadata
(dict[str, str] | dict[bytes, bytes] | None
, default:None
) –Optional metadata for the schema (if inferred). Defaults to None.
Returns:
-
RecordBatch
–description
from_struct_array
classmethod
¶
from_struct_array(struct_array: ArrowArrayExportable) -> RecordBatch
Construct a RecordBatch from a StructArray.
Each field in the StructArray will become a column in the resulting RecordBatch.
Parameters:
-
struct_array
(ArrowArrayExportable
) –Array to construct the record batch from.
Returns:
-
RecordBatch
–New RecordBatch
remove_column ¶
remove_column(i: int) -> RecordBatch
Create new RecordBatch with the indicated column removed.
Parameters:
-
i
(int
) –Index of column to remove.
Returns:
-
RecordBatch
–New record batch without the column.
select ¶
select(columns: list[int] | list[str]) -> RecordBatch
Select columns of the RecordBatch.
Returns a new RecordBatch with the specified columns, and metadata preserved.
Parameters:
Returns:
-
RecordBatch
–New RecordBatch.
set_column ¶
set_column(
i: int, field: str | ArrowSchemaExportable, column: ArrowArrayExportable
) -> RecordBatch
Replace column in RecordBatch at position.
Parameters:
-
i
(int
) –Index to place the column at.
-
field
(str | ArrowSchemaExportable
) –If a string is passed then the type is deduced from the column data.
-
column
(ArrowArrayExportable
) –Column data.
Returns:
-
RecordBatch
–New RecordBatch.
slice ¶
slice(offset: int = 0, length: int | None = None) -> RecordBatch
Compute zero-copy slice of this RecordBatch
Parameters:
-
offset
(int
, default:0
) –Offset from start of record batch to slice. Defaults to 0.
-
length
(int | None
, default:None
) –Length of slice (default is until end of batch starting from offset). Defaults to None.
Returns:
-
RecordBatch
–New RecordBatch.
take ¶
take(indices: ArrowArrayExportable) -> RecordBatch
Select rows from a Table or RecordBatch.
Parameters:
-
indices
(ArrowArrayExportable
) –The indices in the tabular object whose rows will be returned.
Returns:
-
RecordBatch
–description
with_schema ¶
with_schema(schema: ArrowSchemaExportable) -> RecordBatch
Return a RecordBatch with the provided schema.