Spark
    Overview
    Getting Started
    Configuration
    DataFrame
    Aggregation
    Utilities
    YARN
    IO

Hadoop - Scala API

Updated: 2022-02-06

Scala code can be executed as scripts, without compiling first. Just make sure hadoop and all other required jars are added to classpath

$ scala -cp `hadoop classpath` script.scala

where hadoop classpath can conveniently generate a string with all the jars.

A simple example script to print the path names:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val conf = new Configuration()
val fs = FileSystem.get(conf)

fs.listStatus(new Path("path")).map(s => println(s.getPath().getName))