<h2>Read</h2>
Read DataFrame with schema
<pre class="language-scala"><code class="language-scala">val df = spark.read.schema(schema).option("sep","\u0007").option("inferSchema", “false").csv("/path/to/data")
</code></pre>
Infer schema:
<pre class="language-scala"><code class="language-scala">val df = spark.read.option("sep","\u0007").option("inferSchema", "true").csv("/path/to/data")
</code></pre>
<h2>Read From HDFS</h2>
<pre class="language-scala"><code class="language-scala">def read(path: String)(implicit sc: SparkContext): String = {
 val conf = sc.hadoopConfiguration
 val fs = FileSystem.get(conf)
 val in = fs.open(new Path(path))
 scala.io.Source.fromInputStream(in).mkString
}

def readHeader(path: String, delimiter: String = ",")(implicit sc: SparkContext): Array[String] = {
 val header = read(path).trim
 header.split(delimiter, -1).map(_.trim)
}
</code></pre>
<h2>Write</h2>
<h3>Write to local</h3>
<pre class="language-scala"><code class="language-scala">Files.write(
 Paths.get(path),
 df.mkString("\n").getBytes,
 StandardCharsets.UTF_8,
 StandardOpenOption.CREATE)
</code></pre>
<h3>Write to HDFS</h3>
Save in one file(use repartition)
<pre class="language-scala"><code class="language-scala">df.repartition(1).write
 .format("com.databricks.spark.csv")
 .option("header", "true")
 .save(path)
</code></pre>
Append
<pre class="language-scala"><code class="language-scala">df.write.mode(SaveMode.Append).save(path)
</code></pre>
overwrite
<pre class="language-scala"><code class="language-scala">df.write.mode(SaveMode.Overwrite).save("output/")
</code></pre>
with partition
<pre class="language-scala"><code class="language-scala">df.write.partitionBy("zipcode").format("json").save(path)}
</code></pre>

Spark - IO

Google Ads vs Google Marketing Platform vs Google Ad Manager vs AdSense vs AdMob

Life of an Ad Request

Ads - Random Notes

Ads - Versus

Android - Concepts

Android - Overview

Android - Location

Android - Tools

Android - Trouble Shooting

Ansible Cheatsheet

apt/dpkg Cheatsheet

awk Cheatsheet

AWS CLI Cheatsheet

Azure CLI (az) Cheatsheet

Base64 Cheatsheet

Bash

Bazel Cheatsheet

HomeBrew Cheatsheet

Ceph Cheatsheet

Chrome Cheatsheet

Cilium Cheatsheet

AWS vs GCP vs Azure Service Comparison

containerd Cheatsheet

Containers Cheatsheet

C++ Cheatsheet

crictl cheatsheet

cron Cheatsheet

CSS Cheatsheet

CSV Cheatsheet

curl Cheatsheet

Shell Cheatsheet - Devices and File Systems

Docker Build Cheatsheet

Docker Cheatsheet

etcd Cheatsheet

gcloud CLI Cheatsheet

Git Cheatsheet

Go Cheatsheet

grep / egrep / fgrep Cheatsheet

Harbor Cheatsheet

Helm Cheatsheet

HG Cheatsheet

HTML Cheatsheet

HTTP Response Status Codes

Intellij Cheatsheet

Java Cheatsheet

JavaScript Cheatsheet

jq Cheatsheet

JSON Cheatsheet

k9s Cheatsheet

kind Cheatsheet

kubeadm Cheatsheet

kubectl Cheatsheet

LaTeX Cheatsheet

libvirt Cheatsheet

macOS cheatsheet

Markdown Cheatsheet

Minikube Cheatsheet

Cheatsheets - MySQL

Networking - Lookup Tables

Shell Cheatsheet - Networking Commands

Nginx Cheatsheet

Cheatsheets - openssl

PostgreSQL Cheatsheet

Protocol Buffers (Protobuf) Cheatsheet

Python Cheatsheet

QCOW Cheatsheet

RegEx Cheatsheet

sed Cheatsheet

Shell Cheatsheet - Commands

Shell Cheatsheet - Processes

Shell Cheatsheet - Scripts

Shell Cheatsheet - Shortcuts

Shell - Tips

Shell - Users and Groups

Signals Cheatsheet

Cloud Spanner Cheatsheet

Cheatsheets - SQL

ssh / mosh / sshuttle Cheatsheet

Syscalls Cheatsheet

systemd Cheatsheet

Shell Cheatsheet - tar / zip

Terraform Cheatsheet

Cheatsheets - tmux

Ubuntu Desktop Cheatsheet

Cheatsheets - Vim

VS Code Cheatsheet

Windows Cheatsheet

XML Cheatsheet

YAML Cheatsheet

yq Cheatsheet

Cloud - AWS

Hybrid Cloud

Cloud

C++ Keywords - auto

C++ - Best Practices

C++ - const

C++ - constexpr