Pandas - Index

get index

df.index

get columns

df.columns

Read As Pandas DataFrame

http://pandas.pydata.org/pandas-docs/stable/io.html

df = pd.read_csv("train.csv")

then convert DataFrame to arrays:

data = pd.read_csv("train.csv").values

Skip the first column and convert data to float

X = df.values[:, 1:].astype(float)

Extract first column as Y

Y = df.values[:, 0]

Other methods:

pd.readcsv pd.readexcel pd.readhdf pd.readsql pd.readjson pd.readmsgpack (experimental) pd.readhtml pd.readgbq (experimental) pd.readstata pd.readsas pd.readclipboard pd.readpickle

Write

Write From Pandas DataFrame

Write to csv

df.to_csv("data.csv")

Other methods:

df.tocsv df.toexcel df.tohdf df.tosql df.tojson df.tomsgpack (experimental) df.tohtml df.togbq (experimental) df.tostata df.toclipbodf.ard df.to_pickle

Write as JSON

This is similar to the problem dumping JSON in NumPy:

>>> json.dumps(pd.Series([1,2,3]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 0    1
1    2
2    3
dtype: int64 is not JSON serializable
>>> json.dumps(pd.Series([1,2,3]).values)
Traceback (most recent call last):
  ...
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([1, 2, 3]) is not JSON serializable

Convert to list first can solve the problem

>>> json.dumps(pd.Series([1,2,3]).values.tolist())
'[1, 2, 3]'