Pandas - Index

Updated: 2021-11-19

get index

df.index

get columns

df.columns

Read As Pandas DataFrame

http://pandas.pydata.org/pandas-docs/stable/io.html

df = pd.read_csv("train.csv")

then convert DataFrame to arrays:

data = pd.read_csv("train.csv").values

Skip the first column and convert data to float

X = df.values[:, 1:].astype(float)

Extract first column as Y

Y = df.values[:, 0]

Other methods:

pd.read_csv pd.read_excel pd.read_hdf pd.read_sql pd.read_json pd.read_msgpack (experimental) pd.read_html pd.read_gbq (experimental) pd.read_stata pd.read_sas pd.read_clipboard pd.read_pickle

Write

Write From Pandas DataFrame

Write to csv

df.to_csv("data.csv")

Other methods:

df.to_csv df.to_excel df.to_hdf df.to_sql df.to_json df.to_msgpack (experimental) df.to_html df.to_gbq (experimental) df.to_stata df.to_clipbodf.ard df.to_pickle

Write as JSON

This is similar to the problem dumping JSON in NumPy:

>>> json.dumps(pd.Series([1,2,3]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 0    1
1    2
2    3
dtype: int64 is not JSON serializable
>>> json.dumps(pd.Series([1,2,3]).values)
Traceback (most recent call last):
  ...
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([1, 2, 3]) is not JSON serializable

Convert to list first can solve the problem

>>> json.dumps(pd.Series([1,2,3]).values.tolist())
'[1, 2, 3]'