• AWS,  CPU,  EC2,  EMR,  Hadoop,  Qubole,  YARN

    AWS EC2 vCPU and YARN vCores – M4, C4, R4 Instances

    Let’s review how EC2 vCPUs correspond to YARN vCores in Amazon EMR and Qubole Hadoop clusters. As an example, I will choose m4.4xlarge, r4.4xlarge and c4.4xlarge EC2 instance types.

    EC2 vCPU is a thread of a CPU core (typically, there are two threads per core). Does it mean that YARN vCores should be equal to the number of EC2 vCPU? That’s not always the case.

  • Hive,  Hue,  Qubole

    Operation Timed Out for Hive Query in Hue and Qubole

    Sometimes when you execute a very simple query in Hive you can get “The operation timed out. Do you want to retry?” error in Hue or “Error: java.io.IOException: Time limit exceeded for local fetch task with SimpleFetchOptimization” error in Qubole.

    TL;DR: Set this before your query to resolve this:

    set hive.fetch.task.conversion = none;

    Consider the following query that times out in my case:

    SELECT * 
    FROM clicks
    WHERE event_dt >= '2018-09-17' AND event_name = 'WEB_EVENT' AND 
      application = 'CLOUD_APP' AND env = 'PRD' AND id = 5
    LIMIT 100;

    Instead of executing a MapReduce or Tez job Hive just decides that it can read the data directly from the storage, it takes too long time so a time out happens.