Linux or Cygwin command line is very powerful. It lets you perform sophisticated tasks in the command line. A list of common tasks that I find useful are given below.
Viewing Contents of a Text File
- Find line count of file without opening it in a viewer
- wc <fileName>
- This returns the number of lines, words and bytes within the file
- View the part of a large file, from the start
- View the end of a file
Useful Commands for Data Analysis
- Extract a column of data (where , is a delimiter)
- cut -d "," -f 2 <fileName>
- Producing a list of unique values of a column (output as a set of lines)
- cut -d "," -f 2 <fileName> | sort | uniq
- Note that the the output of the cut command has to sent through to sort prior to sending it to uniq
- Producing list of unique values from a column from a set of instances and output as a comma delimited list
- cut -d "," -f 3 <fileName> | sort | uniq | sed -e ':a; N; s/\n/,/; ta'
- The final statement 'sed ....' merges all the elements output from the uniq command into one list
- Finding a file name by recursing sub directories
- find . -name file_name -print
- Search for something with in all files, recursively going into directories
- grep -r 'search_string' *
- Loop through a set of files with in a directory
- for i in $(ls .); do (action using $i as filename);done
- E.g. for i in $(ls .); do (a2ps --center-title=$i-Syntax_constraints --output=$i/syntax.ps $i/constraints-syntax.cl); done
- In the example, the current directory only has directories. The loop body calls a2ps on the .cl file within each directory.
- Find a class in a set of jar archives in a directory
- for i in $(ls .); do (echo $i;jar tf $i | grep class_name );done
- Display file names without extension - this is very useful in conjunction with the for loop to perform some operation each file and specify the output filename with a different extension. This command can be used to replace the 'ls .' within the for statement above.
- ls -1 | sed -e 's/\.[a-zA-Z*]*$//'
- Clear contents of console
- the clear command clears the console window. However, Cygwin does not have clear installed by default.
- ctrl+L clears the screen Bash, which works in cygwin as well
- Produce a printout of nicely formatted code.
- Dump file contents in oct format
- Remove a charactor (char) from a file
- cat file_name | tr -d 'char' > file_name
- Printing 2 pages to a one physical page
- cat file.ps | psnup -pa4 -2 | lpr
- Using tar
- Creating a tar archive
- tar -cvf tar_filename.tar source_dir
- Creating a tarred zipped file
- tar -czvf tar_filename.tar.gz source_dir
- Extracting a tar archive
- tar -xvf tar_filename.tar source_dir
- Extracting tarred and zipped file
- tar -xzvf tar_filename.tar.gz source_dir