Let’s see how Hive on Tez defines the number of map tasks when the input data is stored in large ORC files but having small stripes.
Note. All experiments below were executed on Amazon Hive 2.1.1. This article does not apply to Qubole running on Amazon AWS. Qubole has a different algorithm to define the number of map tasks for ORC files.