CSV¶
arro3.io.infer_csv_schema ¶
infer_csv_schema(
file: IO[bytes] | Path | str,
*,
has_header: bool | None = None,
max_records: int | None = None,
delimiter: str | None = None,
escape: str | None = None,
quote: str | None = None,
terminator: str | None = None,
comment: str | None = None
) -> Schema
Infer a CSV file's schema
If max_records is None, all records will be read, otherwise up to max_records
records are read to infer the schema
Parameters:
-
file(IO[bytes] | Path | str) –The input CSV path or buffer.
-
has_header(bool | None, default:None) –Set whether the CSV file has a header. Defaults to None.
-
max_records(int | None, default:None) –The maximum number of records to read to infer schema. Defaults to None.
-
delimiter(str | None, default:None) –Set the CSV file's column delimiter as a byte character. Defaults to None.
-
escape(str | None, default:None) –Set the CSV escape character. Defaults to None.
-
quote(str | None, default:None) –Set the CSV quote character. Defaults to None.
-
terminator(str | None, default:None) –Set the line terminator. Defaults to None.
-
comment(str | None, default:None) –Set the comment character. Defaults to None.
Returns:
-
Schema–inferred schema from data
arro3.io.read_csv ¶
read_csv(
file: IO[bytes] | Path | str,
schema: ArrowSchemaExportable,
*,
has_header: bool | None = None,
batch_size: int | None = None,
delimiter: str | None = None,
escape: str | None = None,
quote: str | None = None,
terminator: str | None = None,
comment: str | None = None
) -> RecordBatchReader
Read a CSV file to an Arrow RecordBatchReader.
Parameters:
-
file(IO[bytes] | Path | str) –The input CSV path or buffer.
-
schema(ArrowSchemaExportable) –The Arrow schema for this CSV file. Use infer_csv_schema to infer an Arrow schema if needed.
-
has_header(bool | None, default:None) –Set whether the CSV file has a header. Defaults to None.
-
batch_size(int | None, default:None) –Set the batch size (number of records to load at one time). Defaults to None.
-
delimiter(str | None, default:None) –Set the CSV file's column delimiter as a byte character. Defaults to None.
-
escape(str | None, default:None) –Set the CSV escape character. Defaults to None.
-
quote(str | None, default:None) –Set the CSV quote character. Defaults to None.
-
terminator(str | None, default:None) –Set the line terminator. Defaults to None.
-
comment(str | None, default:None) –Set the comment character. Defaults to None.
Returns:
-
RecordBatchReader–A RecordBatchReader with read CSV data
arro3.io.write_csv ¶
write_csv(
data: ArrowStreamExportable | ArrowArrayExportable,
file: IO[bytes] | Path | str,
*,
header: bool | None = None,
delimiter: str | None = None,
escape: str | None = None,
quote: str | None = None,
date_format: str | None = None,
datetime_format: str | None = None,
time_format: str | None = None,
timestamp_format: str | None = None,
timestamp_tz_format: str | None = None,
null: str | None = None
) -> None
Write an Arrow Table or stream to a CSV file.
Parameters:
-
data(ArrowStreamExportable | ArrowArrayExportable) –The Arrow Table, RecordBatchReader, or RecordBatch to write.
-
file(IO[bytes] | Path | str) –The output buffer or file path for where to write the CSV.
-
header(bool | None, default:None) –Set whether to write the CSV file with a header. Defaults to None.
-
delimiter(str | None, default:None) –Set the CSV file's column delimiter as a byte character. Defaults to None.
-
escape(str | None, default:None) –Set the CSV file's escape character as a byte character.
In some variants of CSV, quotes are escaped using a special escape character like
\(instead of escaping quotes by doubling them).By default, writing these idiosyncratic escapes is disabled, and is only used when double_quote is disabled. Defaults to None.
-
quote(str | None, default:None) –Set the CSV file's quote character as a byte character. Defaults to None.
-
date_format(str | None, default:None) –Set the CSV file's date format. Defaults to None.
-
datetime_format(str | None, default:None) –Set the CSV file's datetime format. Defaults to None.
-
time_format(str | None, default:None) –Set the CSV file's time format. Defaults to None.
-
timestamp_format(str | None, default:None) –Set the CSV file's timestamp format. Defaults to None.
-
timestamp_tz_format(str | None, default:None) –Set the CSV file's timestamp tz format. Defaults to None.
-
null(str | None, default:None) –Set the value to represent null in output. Defaults to None.