Parquet is one of the most popular columnar file formats used in many tools including Apache Hive, Spark, Presto, Flink and many others.
For tuning Parquet file writes for various workloads and scenarios let’s see how the Parquet writer works in detail (as of Parquet 1.10 but most concepts apply to later versions as well).