Let’s review how EC2 vCPUs correspond to YARN vCores in Amazon EMR and Qubole Hadoop clusters. As an example, I will choose
m4.4xlarge, r4.4xlarge and
c4.4xlarge EC2 instance types.
EC2 vCPU is a thread of a CPU core (typically, there are two threads per core). Does it mean that YARN vCores should be equal to the number of EC2 vCPU? That’s not always the case.
When you select an instance type for an instance group of a Amazon EMR cluster, you can see that the AWS Management console shows vCores i.e. the number of YARN vCores, not the number of EC2 vCPUs for that instance type:
You can see that
m4.4xlarge got 32 vCores, while
c4.4xlarge got only 16 vCores while all these 3 EC2 instances types have the same number of 16 vCPUs and 8 CPU cores.
Also note that although the AWS Console shows vCores available to YARN, it still shows the memory available on the EC2 instance, not the memory available to YARN, by default:
|56 GiB YARN memory||114 GiB YARN memory||22.5 GiB YARN memory|
As Amazon explains, the different EC2 instance types are recommended for different workload characteristics, it is not just a different set of CPU/Memory configurations.
That’s why more powerful C4 reports less CPU to YARN compared to less powerful M4 as a single C4 YARN container may run more compute intensive operations.
For M4 EMR wants to ensure that YARN runs enough containers to fully utilize the CPU (it is only relevant if a YARN Scheduler considers CPU as a resource, for example, the YARN Capacity Scheduler with DefaultResourceCalculator only uses memory).
Note that M5
m5.4xlarge instance type shows 16 vCores for YARN, not 32 as M4
When you create a cluster and select instances in Qubole it shows EC2 vCPUs and memory:
But Qubole doubles the YARN vCores for each EC2 vCPU despite the EC2 instance type (you can see this in YARN Resource Manager view):
|32 YARN vCores, 60 GiB memory||32 YARN vCores, 112 GiB memory||32 YARN vCores, 26 GiB memory|
You can consider the following settings to override the default number of vCores for YARN in
yarn-site.xml configuration file:
yarn.nodemanager.resource.cpu-vcores yarn.nodemanager.resource.detect-hardware-capabilities yarn.nodemanager.resource.count-logical-processors-as-cores yarn.nodemanager.resource.pcores-vcores-multiplier