CSV¶
arro3.io.infer_csv_schema ¶
infer_csv_schema(
file: IO[bytes] | Path | str,
*,
has_header: bool | None = None,
max_records: int | None = None,
delimiter: str | None = None,
escape: str | None = None,
quote: str | None = None,
terminator: str | None = None,
comment: str | None = None
) -> Schema
Infer a CSV file's schema
If max_records
is None
, all records will be read, otherwise up to max_records
records are read to infer the schema
Parameters:
-
file
(IO[bytes] | Path | str
) –The input CSV path or buffer.
-
has_header
(bool | None
, default:None
) –Set whether the CSV file has a header. Defaults to None.
-
max_records
(int | None
, default:None
) –The maximum number of records to read to infer schema. Defaults to None.
-
delimiter
(str | None
, default:None
) –Set the CSV file's column delimiter as a byte character. Defaults to None.
-
escape
(str | None
, default:None
) –Set the CSV escape character. Defaults to None.
-
quote
(str | None
, default:None
) –Set the CSV quote character. Defaults to None.
-
terminator
(str | None
, default:None
) –Set the line terminator. Defaults to None.
-
comment
(str | None
, default:None
) –Set the comment character. Defaults to None.
Returns:
-
Schema
–inferred schema from data
arro3.io.read_csv ¶
read_csv(
file: IO[bytes] | Path | str,
schema: ArrowSchemaExportable,
*,
has_header: bool | None = None,
batch_size: int | None = None,
delimiter: str | None = None,
escape: str | None = None,
quote: str | None = None,
terminator: str | None = None,
comment: str | None = None
) -> RecordBatchReader
Read a CSV file to an Arrow RecordBatchReader.
Parameters:
-
file
(IO[bytes] | Path | str
) –The input CSV path or buffer.
-
schema
(ArrowSchemaExportable
) –The Arrow schema for this CSV file. Use infer_csv_schema to infer an Arrow schema if needed.
-
has_header
(bool | None
, default:None
) –Set whether the CSV file has a header. Defaults to None.
-
batch_size
(int | None
, default:None
) –Set the batch size (number of records to load at one time). Defaults to None.
-
delimiter
(str | None
, default:None
) –Set the CSV file's column delimiter as a byte character. Defaults to None.
-
escape
(str | None
, default:None
) –Set the CSV escape character. Defaults to None.
-
quote
(str | None
, default:None
) –Set the CSV quote character. Defaults to None.
-
terminator
(str | None
, default:None
) –Set the line terminator. Defaults to None.
-
comment
(str | None
, default:None
) –Set the comment character. Defaults to None.
Returns:
-
RecordBatchReader
–A RecordBatchReader with read CSV data
arro3.io.write_csv ¶
write_csv(
data: ArrowStreamExportable | ArrowArrayExportable,
file: IO[bytes] | Path | str,
*,
header: bool | None = None,
delimiter: str | None = None,
escape: str | None = None,
quote: str | None = None,
date_format: str | None = None,
datetime_format: str | None = None,
time_format: str | None = None,
timestamp_format: str | None = None,
timestamp_tz_format: str | None = None,
null: str | None = None
) -> None
Write an Arrow Table or stream to a CSV file.
Parameters:
-
data
(ArrowStreamExportable | ArrowArrayExportable
) –The Arrow Table, RecordBatchReader, or RecordBatch to write.
-
file
(IO[bytes] | Path | str
) –The output buffer or file path for where to write the CSV.
-
header
(bool | None
, default:None
) –Set whether to write the CSV file with a header. Defaults to None.
-
delimiter
(str | None
, default:None
) –Set the CSV file's column delimiter as a byte character. Defaults to None.
-
escape
(str | None
, default:None
) –Set the CSV file's escape character as a byte character.
In some variants of CSV, quotes are escaped using a special escape character like
\
(instead of escaping quotes by doubling them).By default, writing these idiosyncratic escapes is disabled, and is only used when double_quote is disabled. Defaults to None.
-
quote
(str | None
, default:None
) –Set the CSV file's quote character as a byte character. Defaults to None.
-
date_format
(str | None
, default:None
) –Set the CSV file's date format. Defaults to None.
-
datetime_format
(str | None
, default:None
) –Set the CSV file's datetime format. Defaults to None.
-
time_format
(str | None
, default:None
) –Set the CSV file's time format. Defaults to None.
-
timestamp_format
(str | None
, default:None
) –Set the CSV file's timestamp format. Defaults to None.
-
timestamp_tz_format
(str | None
, default:None
) –Set the CSV file's timestamp tz format. Defaults to None.
-
null
(str | None
, default:None
) –Set the value to represent null in output. Defaults to None.