A better way to import data from .csv file
Post date: May 9, 2014 6:21:04 AM
First, I get only a few hundreds rows of the data, then figure out the class for each column, and lastly import data data based on those classes.
setwd
('~/research/RCS_data/') filename <- "out" RData.input <- paste(filename,'.RData',sep="") csv.input <- paste(filename,'.csv',sep="") ## ------- routine to import RCS MDS data ---------- if ( !file.exists( RData.input ) ) { cat(sprintf('%s does not exist, so importing it...',RData.input)) sampleData <- read.csv(csv.input, header = TRUE, nrows = 300) classes <- sapply(sampleData, class) write.table(x=classes, file='RCS_mds_variables.csv', sep=',',append=F, col.names=T) classes[names(classes) %in% c("seller_id","status_final","workflow_instance_id")] <- "character" df <- read.delim(csv.input, sep = ",", stringsAsFactors = F, header = TRUE, na.strings = c("","---","9999"), colClasses = classes) save(df, file=RData.input) } else { load(RData.input) } ## ------ preprocess ------ df[1:5,] table(df$isFraudulent, useNA='ifany') table(df$isBlocked, useNA='ifany')
Created by Pretty R at inside-R.org