Python Tips & Memos by Physical Oceanography & Climate Laboratory in Hokkaido University

read_csv

df = pd.read_csv('./NAO_djfm.csv', comment='#', sep=r'\s+')　などでcsv ファイルを読み，pandas dataframeに格納する。

この例は#から始まるのがコメント行，空白区切りであるファイルを読む。
header=0はヘッダー行が最初からゼロ行目（普通の数え方だと1行目)であること，を意味する．headerがない場合はheader=Noneを指定する．

values='NaN'は，欠損値は'NaN'と書かれていることを意味している．なお，欠損値を'NaN'ということは' NaN'のように空白が入っていると欠損値として認識されないようだ（それほどしっかりチェックしていないので間違っているかもしれない）

空白区切り文字に対して，delim_whitespace=Trueを指定するのは古い方法。

例

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

df=pd.read_csv('/data/INDICES/Nino_SST_SOI_CPC/nino34_san.csv', \

comment='%',header=0,na_values='NaN')

print(df.column)

data_np=df.values #

# data_np=np.array(df) でも一行上と同じ

print(df.columns) # header行から読み取った列ラベルを表示する．

plt.plot(data[:,0],data[:,1],data[:,0],data[:,4])

plt.show()

You can read text file in which fields are separated by ',' or by space ' '.

Example

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

df=pd.read_csv('/data/INDICES/Nino_SST_SOI_CPC/nino34_san.csv', \

comment='%',header=0,na_values='NaN')

data_np=df.values

print(df.columns) # print column labels read from header

# data_np=np.array(df) also gives the same result as the one line above

plt.plot(data[:,0],data[:,1],data[:,0],data[:,4])

plt.show()

comment=% means that a line begning with % is comment line

header=0 means that header line is the 0th line from the first line, that is, the header line is the first line.

values='NaN' means that the missing value is written as 'NaN' in the file. In this case, it seems that the field of missing value is 'NaN' and not ' NaN' (space should not be included), though I might miss something about this because my examination is not thorough.

If delimeters are spaces, delim_whitespace=True

Page updated

Google Sites

Report abuse