When you run a job in Hadoop you can notice the following error: Application with id 'application_1545962730597_2614' doesn't exist in RM
. And later looking at the YARN Resource Manager UI at http://<RM_IP_Address>:8088/cluster/apps
you can see low Application ID numbers:
-
-
S3 Writes When Inserting Data into a Hive Table in Amazon EMR
Often in an ETL process we move data from one source into another, typically doing some filtering, transformations and aggregations. Let’s consider which write operations are performed in S3.
Just to focus on S3 writes I am going to use a very simple SQL INSERT statement just moving data from one table into another without any transformations as follows:
INSERT OVERWRITE TABLE events PARTITION (event_dt = '2018-12-02', event_hour = '00') SELECT record_id, event_timestamp, event_name, app_name, country, city, payload FROM events_raw;
-
Tez Internals #2 – Number of Map Tasks for Large ORC Files with Small Stripes in Amazon EMR
Let’s see how Hive on Tez defines the number of map tasks when the input data is stored in large ORC files but having small stripes.
Note. All experiments below were executed on Amazon Hive 2.1.1. This article does not apply to Qubole running on Amazon AWS. Qubole has a different algorithm to define the number of map tasks for ORC files.