Post date: Mar 10, 2015 1:24:14 AM
This method works, but may not be efficient for a large file:
## The program randomly select 1M rows from a big file, returned with header.# Get the headerhead -1 input.csv > downsampled.csv# remove the headertail -n +2 input.csv > tmp.csv# downsample the contents and add the header backshuf -n 1000000 tmp.csv >> downsampled.csv# remove the temp filerm tmp.csvFurther resources:
http://stackoverflow.com/questions/9245638/select-random-lines-from-a-file-in-bash
http://stackoverflow.com/questions/339483/how-can-i-remove-the-first-line-of-a-text-file-using-bash-sed-script