• Parquet

    Parquet 1.x File Format – Footer Content

    Every Parquet file has the footer that contains metadata information: schema, row groups and column statistics. The footer is located at the end of the file.

    A parquet file content starts and ends with 4-byte PAR1 “magic” string. Right before the ending PAR1 there is 4-byte footer length size (little-endian encoding):

    The position of the footer can be easily calculated as: File_length - Footer_length - 4

  • AWS,  Flink,  S3

    Flink and S3 Entropy Injection for Checkpoints

    When you use S3 for storing checkpoints it can easily become a bottleneck especially for your Flink application with a lot of subtasks. To overcome this problem FLINK-9061 introduced an entropy ingestion to the checkpoint path.

    But the Flink documentation provides a misleading example (at least up to Flink 1.13) that actually destroys the value of the checkpoint entropy.