Encodings supported by Parquet. Not all encodings are valid for all types. These enums are also used to specify the encoding of definition and repetition levels.

Enumeration Members

BIT_PACKED: 3

Bit packed encoding.

This can only be used if the data has a known max width. Usable for definition/repetition levels encoding.

BYTE_STREAM_SPLIT: 8

Encoding for floating-point data.

K byte-streams are created where K is the size in bytes of the data type. The individual bytes of an FP value are scattered to the corresponding stream and the streams are concatenated. This itself does not reduce the size of the data but can lead to better compression afterwards.

DELTA_BINARY_PACKED: 4

Delta encoding for integers, either INT32 or INT64.

Works best on sorted data.

DELTA_BYTE_ARRAY: 6

Incremental encoding for byte arrays.

Prefix lengths are encoded using DELTA_BINARY_PACKED encoding. Suffixes are stored using DELTA_LENGTH_BYTE_ARRAY encoding.

DELTA_LENGTH_BYTE_ARRAY: 5

Encoding for byte arrays to separate the length values and the data.

The lengths are encoded using DELTA_BINARY_PACKED encoding.

PLAIN: 0

Default byte encoding.

  • BOOLEAN - 1 bit per value, 0 is false; 1 is true.
  • INT32 - 4 bytes per value, stored as little-endian.
  • INT64 - 8 bytes per value, stored as little-endian.
  • FLOAT - 4 bytes per value, stored as little-endian.
  • DOUBLE - 8 bytes per value, stored as little-endian.
  • BYTE_ARRAY - 4 byte length stored as little endian, followed by bytes.
  • FIXED_LEN_BYTE_ARRAY - just the bytes are stored.
PLAIN_DICTIONARY: 1

Deprecated dictionary encoding.

The values in the dictionary are encoded using PLAIN encoding. Since it is deprecated, RLE_DICTIONARY encoding is used for a data page, and PLAIN encoding is used for dictionary page.

RLE: 2

Group packed run length encoding.

Usable for definition/repetition levels encoding and boolean values.

RLE_DICTIONARY: 7

Dictionary encoding.

The ids are encoded using the RLE encoding.