logo

CSV Cheatsheet

Read

Read from .csv file:

import csv
reader = csv.reader(open('input.csv', 'rb'), delimiter=','))

or

import csv
with open('input.csv', 'rb') as f:
    reader = csv.reader(f, delimiter=','))

To skip first line(header)

next(reader, None)

Print all rows:

for row in reader:
    print row

Read as a list of key-value pairs

with open('foo.csv') as f:
    reader = csv.reader(f, delimiter='|')
    header = next(reader)
    for row in reader:
        print(list(zip(header,row)))

or dict

with open('foo.csv') as f:
    reader = csv.reader(f, delimiter='|')
    header = next(reader)
    for row in reader:
        print(dict(zip(header,row)))

Write

import csv
writer = csv.writer(open('output.csv', 'wb'), delimiter=','))
writer.writerow(['a','b','c'])

Trouble Shooting

Error message:

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Solution:

Use 'rU' instead.

Read CSV From HDFS

result = subprocess.run(['hadoop', 'fs', '-text', '/path/to/data/part*'], stdout=subprocess.PIPE)
lines = result.stdout.decode().strip().split("\n")
reader = csv.reader(lines)