logo

Spark - Configuration

Last Updated: 2021-11-19

Here lists ways to set and get configs.

Spark Configs

Spark related configs should be set in a SparkConf object. There are 3 options:

Option 1: set when calling spark-submit

Use --conf KEY=VALUE

Option 2: set in code

Option 3: set in file(defaults)

Ensure that SPARK_HOME, SPARK_CONF_DIR correctly set.

  • $SPARK_CONF_DIR can be set to $SPARK_HOME/conf
  • or make a copy of $SPARK_HOME/conf to somewhere else, the benefit is that multiple Spark installations (versions) can use the same conf folder, and no changes when upgrading to a new version

Then the config files can be found in:

  • $SPARK_CONF_DIR/spark-defaults.conf
  • $SPARK_CONF_DIR/spark-env.sh

(they are not there by default, instead they are called spark-defaults.conf.template and spark-env.sh .template, just make a copy and rename them)

In spark-env.sh, HADOOP_CONF_DIR should be defined if you want to run spark in yarn mode:

HADOOP_CONF_DIR=/path/to/hadoop/conf

Hadoop/YARN/HDFS Configs

Ensure that HADOOP_HOME, HADOOP_CONF_DIR and/or YARN_CONF_DIR are correctly set.

Hadoop configs:

  • $HADOOP_CONF_DIR/core-site.xml
  • $HADOOP_CONF_DIR/hdfs-site.xml

To get Hadoop Configs from code:

print Hadoop Config:

val hadoopConf = sc.hadoopConfiguration.iterator()
while (hadoopConf.hasNext) {
    println(hadoopConf.next().toString())
}

Print Configs

To print SparkConfig:

sc.getConf.toDebugString

Spark SQL Configs

Set SQL Configs: SET key=value;

sqlContext.sql("SET spark.sql.shuffle.partitions=10;")

View SQL Configs:

val sqlConf = sqlContext.getAllConfs
sqlConf.foreach(x => print(x._1 + " : " + x._2))

Extra Classpath

  • ./spark-submit with --driver-class-path to augment the driver classpath
  • spark.executor.extraClassPath to augment the executor classpath
  • or copy the jars to $SPARK_HOME/jars folder

Logging Configs

Logging can be set in $SPARK_CONF_DIR/log4j.properties

Enable DEBUG logging level for org.apache.spark.SparkEnv:

log4j.logger.org.apache.spark.SparkEnv=DEBUG

Other Configs

in spark-defaults.conf:

spark.yarn.dist.files           $SPARK_HOME/conf/metrics.properties