Spark - YARN

Updated: 2018-12-11

yarn-cluster vs yarn-client

  • yarn-cluster: the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.

  • yarn-client: the driver runs in the client process, and the application master is only used for requesting resources from YARN

--master

  • Spark standalone and Mesos modes: --master <master’s address>
  • YARN: --master yarn. The address will be picked up from Hadoop configs

YARN Commands

Show logs

$ yarn logs -applicationId <applicationId>

List all running nodes(only nodes with Node-State as RUNNING)

$ yarn node -list

List all nodes(not limited to RUNNING nodes, but also LOST, DECOMMISSIONED, etc)

$ yarn node -list -all

Check queue status

$ yarn queue -status default
Queue Name : default
State : RUNNING
Capacity : 4.9%
Current Capacity : 91.2%
Maximum Capacity : 50.0%
Default Node Label expression :
Accessible Node Labels : *