• Read a Parquet file into Arrow data.

    This returns an Arrow table in WebAssembly memory. To transfer the Arrow table to JavaScript memory you have two options:

    • (Easier): Call Table.intoIPCStream to construct a buffer that can be parsed with Arrow JS's tableFromIPC function.
    • (More performant but bleeding edge): Call Table.intoFFI to construct a data representation that can be parsed zero-copy from WebAssembly with arrow-js-ffi using parseTable.

    Example with IPC stream:

    import { tableFromIPC } from "apache-arrow";
    import initWasm, {readParquet} from "parquet-wasm";

    // Instantiate the WebAssembly context
    await initWasm();

    const resp = await fetch("https://example.com/file.parquet");
    const parquetUint8Array = new Uint8Array(await resp.arrayBuffer());
    const arrowWasmTable = readParquet(parquetUint8Array);
    const arrowTable = tableFromIPC(arrowWasmTable.intoIPCStream());

    Example with arrow-js-ffi:

    import { parseTable } from "arrow-js-ffi";
    import initWasm, {readParquet, wasmMemory} from "parquet-wasm";

    // Instantiate the WebAssembly context
    await initWasm();
    const WASM_MEMORY = wasmMemory();

    const resp = await fetch("https://example.com/file.parquet");
    const parquetUint8Array = new Uint8Array(await resp.arrayBuffer());
    const arrowWasmTable = readParquet(parquetUint8Array);
    const ffiTable = arrowWasmTable.intoFFI();
    const arrowTable = parseTable(
    WASM_MEMORY.buffer,
    ffiTable.arrayAddrs(),
    ffiTable.schemaAddr()
    );

    Parameters

    • parquet_file: Uint8Array

      Uint8Array containing Parquet data

    • Optional options: ReaderOptions

      Options for reading Parquet data. Optional keys include:

      • batchSize: The number of rows in each batch. If not provided, the upstream parquet default is 1024.
      • rowGroups: Only read data from the provided row group indexes.
      • limit: Provide a limit to the number of rows to be read.
      • offset: Provide an offset to skip over the given number of rows.
      • columns: The column names from the file to read.

    Returns Table