Pandas - Index
Last Updated: 2021-11-19
get index
df.index
get columns
df.columns
Read As Pandas DataFrame
http://pandas.pydata.org/pandas-docs/stable/io.html
df = pd.read_csv("train.csv")
then convert DataFrame to arrays:
data = pd.read_csv("train.csv").values
Skip the first column and convert data to float
X = df.values[:, 1:].astype(float)
Extract first column as Y
Y = df.values[:, 0]
Other methods:
pd.read_csv pd.read_excel pd.read_hdf pd.read_sql pd.read_json pd.read_msgpack (experimental) pd.read_html pd.read_gbq (experimental) pd.read_stata pd.read_sas pd.read_clipboard pd.read_pickle
Write
Write From Pandas DataFrame
Write to csv
df.to_csv("data.csv")
Other methods:
df.to_csv df.to_excel df.to_hdf df.to_sql df.to_json df.to_msgpack (experimental) df.to_html df.to_gbq (experimental) df.to_stata df.to_clipbodf.ard df.to_pickle
Write as JSON
This is similar to the problem dumping JSON in NumPy:
>>> json.dumps(pd.Series([1,2,3]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 0 1
1 2
2 3
dtype: int64 is not JSON serializable
>>> json.dumps(pd.Series([1,2,3]).values)
Traceback (most recent call last):
...
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([1, 2, 3]) is not JSON serializable
Convert to list first can solve the problem
>>> json.dumps(pd.Series([1,2,3]).values.tolist())
'[1, 2, 3]'