Import large file size into Python using pandas

Post date: Aug 11, 2014 4:02:34 AM

Here is how:

# This notebook is for big file import on 14001 import pandas as pd import numpy as np import matplotlib.pyplot as plt ## Import text file into Python # In this case we have two separate text files; header and content files. # First, import the header file. header_row = pd.read_csv(filepath_or_buffer='na_alldata_header.csv',sep=',') header_row # Second, import the content file by incorporating the header file from the previous step. This step can take quite a long time if the file is huge. A 10K x 1300 text file (71MB) takes less than 3 seconds to import. I'm testing on 13M x 1300 (91GB)... ## For a small-sized file, you can do this, but this will not work for a large file size. df = pd.read_csv(filepath_or_buffer='na_alldata.csv',sep=',') # For a large file size, you might want to use iterator ## gives TextFileReader, which is iteratable with chunks of 1000 rows. tp = pd.read_csv(filepath_or_buffer='na_alldata.csv',sep=',',na_values='.',header=None, names=header_row,iterator=True, chunksize=1000) ## df is DataFrame. If error do list(tp) df = pd.concat(list(tp), ignore_index=True) ## if version 3.4, use tp df = pd.concat(tp, ignore_index=True)