After creating an Amazon EMR cluster with Spark support, and running a spark application you can notice that the Spark job creates too many tasks to process even a very small data set.
For example, I have a small table country_iso_codes
having 249 rows and stored in a comma-delimited text file with the length of 10,657 bytes.
When running the following application on Amazon EMR 5.7 cluster with Spark 2.1.1 with the default settings I can see the large number of partitions generated: