Spark - Configuration
Here lists ways to set and get configs.
Spark Configs
Spark related configs should be set in a SparkConf object. There are 3 options:
Option 1: set when calling spark-submit
Use --conf KEY=VALUE
Option 2: set in code
Option 3: set in file(defaults)
Ensure that SPARK_HOME, SPARK_CONF_DIR correctly set.
$SPARK_CONF_DIRcan be set to$SPARK_HOME/conf-
or make a copy of
$SPARK_HOME/confto somewhere else, the benefit is that multiple Spark installations (versions) can use the same conf folder, and no changes when upgrading to a new version
Then the config files can be found in:
$SPARK_CONF_DIR/spark-defaults.conf$SPARK_CONF_DIR/spark-env.sh
(they are not there by default, instead they are called spark-defaults.conf.template and spark-env.sh .template, just make a copy and rename them)
In spark-env.sh, HADOOP_CONF_DIR should be defined if you want to run spark in yarn mode:
HADOOP_CONF_DIR=/path/to/hadoop/conf
Hadoop/YARN/HDFS Configs
Ensure that HADOOP_HOME, HADOOP_CONF_DIR and/or YARN_CONF_DIR are correctly set.
Hadoop configs:
$HADOOP_CONF_DIR/core-site.xml$HADOOP_CONF_DIR/hdfs-site.xml
To get Hadoop Configs from code:
print Hadoop Config:
val hadoopConf = sc.hadoopConfiguration.iterator()
while (hadoopConf.hasNext) {
println(hadoopConf.next().toString())
}
Print Configs
To print SparkConfig:
sc.getConf.toDebugString
Spark SQL Configs
Set SQL Configs: SET key=value;
sqlContext.sql("SET spark.sql.shuffle.partitions=10;")
View SQL Configs:
val sqlConf = sqlContext.getAllConfs
sqlConf.foreach(x => print(x._1 + " : " + x._2))
Extra Classpath
- ./spark-submit with --driver-class-path to augment the driver classpath
spark.executor.extraClassPathto augment the executor classpath- or copy the jars to
$SPARK_HOME/jarsfolder
Logging Configs
Logging can be set in $SPARK_CONF_DIR/log4j.properties
Enable DEBUG logging level for org.apache.spark.SparkEnv:
log4j.logger.org.apache.spark.SparkEnv=DEBUG
Other Configs
in spark-defaults.conf:
spark.yarn.dist.files $SPARK_HOME/conf/metrics.properties