Sometimes source data arrives from a streaming application as a large set of small Parquet files that you need to compact for more effective read by analytic applications.
You can observe that by default the number of tasks to read such Parquet files is larger than expected. Let’s see why.