• Read a Parquet file into a stream of Arrow RecordBatches.

    This returns a ReadableStream containing RecordBatches in WebAssembly memory. To transfer the Arrow table to JavaScript memory you have two options:

    • (Easier): Call RecordBatch.intoIPCStream to construct a buffer that can be parsed with Arrow JS's tableFromIPC function. (The table will have a single internal record batch).
    • (More performant but bleeding edge): Call RecordBatch.intoFFI to construct a data representation that can be parsed zero-copy from WebAssembly with arrow-js-ffi using parseRecordBatch.

    Example with IPC stream:

    import { tableFromIPC } from "apache-arrow";
    import initWasm, {readParquetStream} from "parquet-wasm";

    // Instantiate the WebAssembly context
    await initWasm();

    const stream = await wasm.readParquetStream(url);

    const batches = [];
    for await (const wasmRecordBatch of stream) {
    const arrowTable = tableFromIPC(wasmRecordBatch.intoIPCStream());
    batches.push(...arrowTable.batches);
    }
    const table = new arrow.Table(batches);

    Example with arrow-js-ffi:

    import { parseRecordBatch } from "arrow-js-ffi";
    import initWasm, {readParquetStream, wasmMemory} from "parquet-wasm";

    // Instantiate the WebAssembly context
    await initWasm();
    const WASM_MEMORY = wasmMemory();

    const stream = await wasm.readParquetStream(url);

    const batches = [];
    for await (const wasmRecordBatch of stream) {
    const ffiRecordBatch = wasmRecordBatch.intoFFI();
    const recordBatch = parseRecordBatch(
    WASM_MEMORY.buffer,
    ffiRecordBatch.arrayAddr(),
    ffiRecordBatch.schemaAddr(),
    true
    );
    batches.push(recordBatch);
    }
    const table = new arrow.Table(batches);

    Parameters

    • url: string

      URL to Parquet file

    • Optional content_length: number

    Returns Promise<ReadableStream>