AWS,  CPU,  EC2,  EMR,  Hadoop,  Qubole,  YARN

AWS EC2 vCPU and YARN vCores – M4, C4, R4 Instances

Let’s review how EC2 vCPUs correspond to YARN vCores in Amazon EMR and Qubole Hadoop clusters. As an example, I will choose m4.4xlarge, r4.4xlarge and c4.4xlarge EC2 instance types.

EC2 vCPU is a thread of a CPU core (typically, there are two threads per core). Does it mean that YARN vCores should be equal to the number of EC2 vCPU? That’s not always the case.

Amazon EMR

When you select an instance type for an instance group of a Amazon EMR cluster, you can see that the AWS Management console shows vCores i.e. the number of YARN vCores, not the number of EC2 vCPUs for that instance type:

You can see that m4.4xlarge got 32 vCores, while r4.4xlarge and c4.4xlarge got only 16 vCores while all these 3 EC2 instances types have the same number of 16 vCPUs and 8 CPU cores.

Also note that although the AWS Console shows vCores available to YARN, it still shows the memory available on the EC2 instance, not the memory available to YARN, by default:

m4.4xlarge r4.4xlarge c4.4xlarge
56 GiB YARN memory 114 GiB YARN memory 22.5 GiB YARN memory

As Amazon explains, the different EC2 instance types are recommended for different workload characteristics, it is not just a different set of CPU/Memory configurations.

That’s why more powerful C4 reports less CPU to YARN compared to less powerful M4 as a single C4 YARN container may run more compute intensive operations.

For M4 EMR wants to ensure that YARN runs enough containers to fully utilize the CPU (it is only relevant if a YARN Scheduler considers CPU as a resource, for example, the YARN Capacity Scheduler with DefaultResourceCalculator only uses memory).

Note that M5 m5.4xlarge instance type shows 16 vCores for YARN, not 32 as M4 m4.4xlarge.

Qubole

When you create a cluster and select instances in Qubole it shows EC2 vCPUs and memory:

But Qubole doubles the YARN vCores for each EC2 vCPU despite the EC2 instance type (you can see this in YARN Resource Manager view):

m4.4xlarge r4.4xlarge c4.4xlarge
32 YARN vCores, 60 GiB memory 32 YARN vCores, 112 GiB memory 32 YARN vCores, 26 GiB memory

Override Defaults

You can consider the following settings to override the default number of vCores for YARN in yarn-site.xml configuration file:

yarn.nodemanager.resource.cpu-vcores
yarn.nodemanager.resource.detect-hardware-capabilities
yarn.nodemanager.resource.count-logical-processors-as-cores
yarn.nodemanager.resource.pcores-vcores-multiplier