Skip to content

CSV

arro3.io.infer_csv_schema

infer_csv_schema(
    file: IO[bytes] | Path | str,
    *,
    has_header: bool | None = None,
    max_records: int | None = None,
    delimiter: str | None = None,
    escape: str | None = None,
    quote: str | None = None,
    terminator: str | None = None,
    comment: str | None = None
) -> Schema

Infer a CSV file's schema

If max_records is None, all records will be read, otherwise up to max_records records are read to infer the schema

Parameters:

  • file (IO[bytes] | Path | str) –

    The input CSV path or buffer.

  • has_header (bool | None, default: None ) –

    Set whether the CSV file has a header. Defaults to None.

  • max_records (int | None, default: None ) –

    The maximum number of records to read to infer schema. Defaults to None.

  • delimiter (str | None, default: None ) –

    Set the CSV file's column delimiter as a byte character. Defaults to None.

  • escape (str | None, default: None ) –

    Set the CSV escape character. Defaults to None.

  • quote (str | None, default: None ) –

    Set the CSV quote character. Defaults to None.

  • terminator (str | None, default: None ) –

    Set the line terminator. Defaults to None.

  • comment (str | None, default: None ) –

    Set the comment character. Defaults to None.

Returns:

  • Schema

    inferred schema from data

arro3.io.read_csv

read_csv(
    file: IO[bytes] | Path | str,
    schema: ArrowSchemaExportable,
    *,
    has_header: bool | None = None,
    batch_size: int | None = None,
    delimiter: str | None = None,
    escape: str | None = None,
    quote: str | None = None,
    terminator: str | None = None,
    comment: str | None = None
) -> RecordBatchReader

Read a CSV file to an Arrow RecordBatchReader.

Parameters:

  • file (IO[bytes] | Path | str) –

    The input CSV path or buffer.

  • schema (ArrowSchemaExportable) –

    The Arrow schema for this CSV file. Use infer_csv_schema to infer an Arrow schema if needed.

  • has_header (bool | None, default: None ) –

    Set whether the CSV file has a header. Defaults to None.

  • batch_size (int | None, default: None ) –

    Set the batch size (number of records to load at one time). Defaults to None.

  • delimiter (str | None, default: None ) –

    Set the CSV file's column delimiter as a byte character. Defaults to None.

  • escape (str | None, default: None ) –

    Set the CSV escape character. Defaults to None.

  • quote (str | None, default: None ) –

    Set the CSV quote character. Defaults to None.

  • terminator (str | None, default: None ) –

    Set the line terminator. Defaults to None.

  • comment (str | None, default: None ) –

    Set the comment character. Defaults to None.

Returns:

arro3.io.write_csv

write_csv(
    data: ArrowStreamExportable | ArrowArrayExportable,
    file: IO[bytes] | Path | str,
    *,
    header: bool | None = None,
    delimiter: str | None = None,
    escape: str | None = None,
    quote: str | None = None,
    date_format: str | None = None,
    datetime_format: str | None = None,
    time_format: str | None = None,
    timestamp_format: str | None = None,
    timestamp_tz_format: str | None = None,
    null: str | None = None
) -> None

Write an Arrow Table or stream to a CSV file.

Parameters:

  • data (ArrowStreamExportable | ArrowArrayExportable) –

    The Arrow Table, RecordBatchReader, or RecordBatch to write.

  • file (IO[bytes] | Path | str) –

    The output buffer or file path for where to write the CSV.

  • header (bool | None, default: None ) –

    Set whether to write the CSV file with a header. Defaults to None.

  • delimiter (str | None, default: None ) –

    Set the CSV file's column delimiter as a byte character. Defaults to None.

  • escape (str | None, default: None ) –

    Set the CSV file's escape character as a byte character.

    In some variants of CSV, quotes are escaped using a special escape character like \ (instead of escaping quotes by doubling them).

    By default, writing these idiosyncratic escapes is disabled, and is only used when double_quote is disabled. Defaults to None.

  • quote (str | None, default: None ) –

    Set the CSV file's quote character as a byte character. Defaults to None.

  • date_format (str | None, default: None ) –

    Set the CSV file's date format. Defaults to None.

  • datetime_format (str | None, default: None ) –

    Set the CSV file's datetime format. Defaults to None.

  • time_format (str | None, default: None ) –

    Set the CSV file's time format. Defaults to None.

  • timestamp_format (str | None, default: None ) –

    Set the CSV file's timestamp format. Defaults to None.

  • timestamp_tz_format (str | None, default: None ) –

    Set the CSV file's timestamp tz format. Defaults to None.

  • null (str | None, default: None ) –

    Set the value to represent null in output. Defaults to None.