A row group is a unit of work for reading from Parquet that cannot be split into smaller parts, and you expect that the number of tasks created by Spark is no more than the total number of row groups in your Parquet data source.
But Spark still can create much more tasks than the number of row groups. Let’s see how this is possible.