Python Tips & Memos by Physical Oceanography & Climate Laboratory in Hokkaido University

to_netcdf

# 例

encoding = {var: {'chunksizes': (1, ysz, xsz), 'dtype': 'float32', 'complevel':3, 'zlib':True}} #'zlib':True is needed for compression

xr_data.to_netcdf(path=fn_out_fl, mode='w', encoding=encoding)

to_netcdfはxarray data array をnetcdfにファイルを書き出す．この際に，encodingを指定することで，主力データのdtypeおよび圧縮を指定できることは，ディスク容量を節約するために重要だ．出力データは，データ解析の場合float32にするのがよいだろう．pythonのデフォルトはfloat64なので，それよりも半分のディスク容量で済む．さらに圧縮をかけることで，試したデータの場合42%程度に圧縮でした．圧縮レベルは1-9まで選べるが，圧縮レベル1でも43.0%の圧縮率で，圧縮レベル9では41.7%の圧縮率なので，圧縮の有無は大きな違いを生むが，圧縮レベルの違いではそれほど違わない．圧縮レベルを高くすると，圧縮に時間がかかるので，圧縮レベルは3～6程度でいいだろう．

to_netcdf command writes the xarray data array to a netcdf file. When you write a large data, it is important to save the disk space by specifying the encoding for dtype and compression. Output data should be set to float32 for data analysis (float64 restart file of numerical modelling). The default for python is float64, so float32 compared with float64 costs just half for disk space. By specifying compression, the output data become further small. I get 42% of compression for my test data. The compression level can be chosen from 1-9, but without much changes for file size. Compression level 1 reduces the file size to 43.0% of file size without compression, and compression level gives 41.7% only 2.3% further reduction from level 1. Consequently, with or without compression makes a big difference, but compression level does not. High compression levels can take longer time for compression, and thus I recommend compression level between 3 and 6.

#example

encoding = {var: {'chunksizes': (1, ysz, xsz), 'dtype': 'float32', 'complevel':3, 'zlib':True}} #'zlib':True is needed for compression

xr_data.to_netcdf(path=fn_out_fl, mode='w', encoding=encoding)

Page updated

Google Sites

Report abuse