Very basic UNIX

1) Remove files and directories recursively within the present working directory

rm -rf *

2) Sed command to remove first line of file and save it in another file

sed '1d' test.txt > test2.txt

3) awk command to select a column from qstat and sort (to sort by name)

qstat | awk '{print $4}' | sort -k2n

k2 stands for column 2 and n is numeric ordering

4) Shell script to iterate through directories (from Shurane at stackoverflow)

#!/bin/bash

# Shell script to iterate through directories in pwd

for dir in */

do

echo $dir

done

5) Shell script to iterate and cd into directories (handles directory names with spaces) (inspired by stackoverflow)

#!/bin/bash

# Shell script to iterate through directories in pwd

for dir in */

do

echo $dir

# go to directory

cd "$dir"

# run mcc to compile matlab code

pwd

# go back to parent directory

cd ..

done

6) Shell script to iterate through directories (handles directories with spaces) and compiles matlab code in each of these directories

#!/bin/bash

# Shell script to iterate through directories in pwd

for dir in */

do

echo $dir

# go to directory

cd "$dir"

# run mcc to compile matlab code

pwd

/usr/local/MATLAB/R2011a/bin/mcc -mv unixfile.m

# /opt/matlab/bin/mcc -mv unixfile.m

# go back to parent directory

cd ..

done

7) Shell script to launch multiple commands (generate "swarm" of jobs on cluster)

8) Shell script to call cellprofiler on cluster

#!/bin/sh

#S -S /bin/sh

# Shell script to call cellprofiler file

nohup /work/software/bin/cp -c -p timelapse_analysis.cp -i ~/cellprofiler_work/concatenated_and_stabilized_tifs -o ~/cellprofiler_work/cellprofiler_output_demo

# nohup /work/software/bin/cp -c -p yourpipeline_name .cp -i your_path_to_your_images -o your_output_folder_for_this_job

9) SSH login without password using ssh-keygen and ssh-copy-id (article by Ramesh Natarajan and anorther article. also see solution by DEinspanjer in an article)

Briefly, do the following

From localhost

> ssh-keygen -t rsa -b 2048

Then

> ssh-copy-id login@remoteserver

Finally

> ssh login@remoteserver

10) Executing a sequence of commands after ssh-ing into a remote server

ssh login@remoterserver "cd codedir/matlab; ./run_script.sh; date"

Inspired by a solution by DEinspanjer in an article on copying ssh-key from a localhost that does not have ssh-copy-id to a remote server

11) Copy a directory

mkdir <new_directory_name>

cp -r <old_directory_name>/* <new_directory_name>

12) How to concatenate within shell script

echo $line >> "shell_cp$iCount.sh"

where $iCount is a variable

AND

cp $1 ${file_name_only}_${iCount}.tif

where $file_name_only and $iCount are variables

13) Store output of unix command in shell variable

present_dir=`pwd`

14) Look for substring within directory name

for dir in */

do

echo $dir

# cat file in directory into new file

case $dir in

*44*) gamma_val=44;;

*20*) gamma_val=20;;

*10*) gamma_val=10;;

*) echo "this is the default case ERROR";;

esac

# USEFUL WORK

done

15) Shell script to copy files from all sub-directories (within the current directory) and concatenate them to create

different concatenate files (based on some substring)

for dir in */

do

echo $dir

# cat file in directory into new file

case $dir in

*44*) gamma_val=44;;

*20*) gamma_val=20;;

*10*) gamma_val=10;;

*) echo "this is the default case ERROR";;

esac

cat "$dir"\results_jv_optimized.csv >> combined_results_jv_optimized_gamma${gamma_val}_sept252013.csv

pwd

done

16) use nice command

if you cannot set priority (due to access restriction)

nice +19 program.sh

if you can set priority

nice -19 program.sh

17) Shell script to compile matlab code, wait/delay and then execute compile code

#!/bin/sh

# compile code

/usr/local/MATLAB/R2011a/bin/mcc -mv unixfile.m

# delay: verify if compilation worked

sleep 10

# execute compiled code

nohup ./run_unixfile.sh /usr/local/MATLAB/R2011a

18) Shell script to iterate through directories in pwd and refer to file in each directory and concatenate and create a single file

#!/bin/bash

# Shell script to iterate through directories in pwd

# and refer to file in each directory and concatenate and

# and create a single file

# delete combined file if it exists

if [ -f combined_initialguess_tcl.csv ]

then

rm combined_initialguess_tcl.csv

else

echo "file not found"

fi

for dir in */

do

echo $dir

# cat file in directory into new file

cat "$dir"\initialguess_tcl.csv >> combined_initialguess_tcl.csv

pwd

# /opt/matlab/bin/mcc -mv unixfile.m

# go back to parent directory

done

19) Shell script to copy file(s) in present directory into each sub-directory

#!/bin/bash

# Shell script to iterate through directories in pwd

# and copies some matlab files into each directory

for dir in */

do

present_dir=`pwd`

# copy files into each directory

echo ${present_dir}/${dir}

cp odecall_eclipse_tcl_jv_local_fileptr.m "${present_dir}/${dir}"

done

20) Command to store output of ls command in a variable

var=`find *BFP*nuclei*`

echo "$var"

21) Command to delete swap file

# -A shows all all files

ls -A

rm -f <name_of_file>

22) Command to check if directory exists before creating a new directory

# make a temporary directory if it does not exist already

if [ ! -d agg ]

then

mkdir agg

fi

23) Command to check if directory (using regular expression) exists before creating a new directory

# make a temporary directory if it does not exist already

if [ ! -d position_* ]

then

mkdir agg

fi

24) Command to issue verbal alert and message by shell script on Mac OS X

say "Execution of Combined Image Analyzer successfully completed"

25) Command to check if file (regular file not directory or device file) exists

if [ -f mKate2.TIF ] && [ -f DAPI.TIF ]

then

echo "exists"

else

echo "does not exist"

fi

26) Unzip tgz files in Unix

tar xzvf filename.tgz

27) Unzip tar.bz2 file

tar jxf module.tar.bz2

28) Unzip tar.gz file

gunzip file.tar.gz

tar -xvf file.tar

OR

tar -zxvf file.tar.gz

# -z does gunzip

Adapted from response by Bigwebmaster at this link

29a) Unzip a gz file in Unix

gunzip filename.gz

29b) Unzip a zip file in Unix

unzip hmdb_metabolites.zip

30) Secure copy into a remote server

scp -rp dir_name/* user_name@remote_server:/home/dir_name

AND

# secure from remote server to localhost

scp user_name@remote_server:/home/dir_name ./

31) Download a file(s) from a website in a shell script

wget website_url

32) watch (execute a program periodically)

watch -d ls -l

33) Extract first few lines of a file

# extract first 3 lines

head -n 3 predictions.tab > test.tab

# Select a range of lines from a file (sed, stackoverflow)

sed -n '2,50p' command_file.txt > command_file_1.txt

34) Print starting from the 3rd line of a file (using tail)

tail -n+3 test.txt

# CAUTION tail -n3 will print LAST 3 lines!

# Select a range of lines from a file (sed, stackoverflow)

sed -n '2,50p' command_file.txt > command_file_1.txt

35) Increment counter

iCount=$(( iCount+1 ))

OR

counter=`expr $counter + 1`

36) Concatenate a number to a filename (also write something to a file)

echo "#!/bin/bash" > "shell_cp$iCount.sh"

37) Cut columns 1 to 9 using cut command (default is tab delimited)

cut -f1-9 test.txt # tab delimited file

other examples see the following link

for example

cut -d':' -f3

will cut the second column from a colon (:) delimited file

# cut all columns from the 1st to the last

cut -d ',' -f 1- combined_hlty_prism_shotgun.csv

# cut all columns from 1st to 3rd

cut -d ',' -f -3 combined_hlty_prism_shotgun.csv

38) Do a dictionary sort on a file based on 2nd column

sort -dk2 test.tab

# numerical sort based on 2nd column

sort -nk2 test.tab

# other examples (inspired by Unix concepts and applications: Soumitabha Das)

sort -t ',' -k 2 combined_hlty_prism_shotgun.csv # sort on 2nd field

sort -t ',' -k 1,1 -k 3,3 combined_hlty_prism_shotgun.csv # sort on primary key 1st field and secondary key 3rd field

sort -t ',' -k 1.5,1.6 combined_hlty_prism_shotgun.csv # sort on 6th character of 1st field (useful if mmddyy)

# removes repeated lines (unique)

sort -u

Get unique values in UNIX using uniq (link)

cat file.csv | uniq | wc -l

39) Shell script to sort on a particular column and then take that column out

#!/bin/sh

# shell script to sort compound predictions file based on column name (kegg compounds ids)

# and take out that column

# this will sort on kegg compound ids and then take out that column

# and produce a list of files of compound abundances (without the compound id columns)

# but that have been sorted on compound ids

# the output are files named temp_sorted_3.txt, etc

# then you use paste to paste together all these files like so -

# paste temp_sorted_3.txt temp_sorted_4.txt .... > combined_pasted_sorted.txt

# it also produces a file named kegg_compounds_SORTE.txt

# this has kegg compound ids and names sorted on UNIX

tail -n+2 predicted_compounds_out_3.txt | sort -dk2 | cut -f1 > kegg_compounds_SORTE.txt

iCount=1

for file_name in ls pred*.txt

do

cat "$file_name" | sort -dk2 | cut -f2 > "temp_sorted_$iCount.txt"

iCount=$(( iCount+1 ))

done

40) paste together files by column

paste test1.txt test2.txt

where test1.txt has

1 1

2 3

and test2.txt has

10 19

12 34

will produce

1 1 10 19

2 3 12 34

Also paste with a specified column delimiter

paste -d' ' $prev_filename $file_name > temp_file

41) copy files from one source directory to the present directory (without going to the source directory)

cp ./../normalize/module.txt module.txt

# . is current folder

## .. means go back one folder

42) sudo apt-get

and

sudo

and

pip

For example

sudo pip install pandas

43) tr (transliterate)

# converts ^M on Mac OS X and makes it UNIX compatible

tr '\r' '\n' < ko_ids.txt > ko_ids_modunix.txt

# using tr, convert upper case to lower case

tr '[A-Z]' '[a-z]' < combined_hlty_prism_shotgun.csv

44) join (link)

join input_json.txt input_gz.txt -1 1 -2 4 --nocheck-order -v 1

45) pr

46) curl

# simple example

curl -O http://bugs.python.org/file32324/patch_readline_issue_18458.sh

# an advanced example

curl -X POST -d '{"disease":["EFO_0000253"]}' --header 'Content-Type: application/json' https://platform-api.opentargets.io/v3/platform/public/evidence/filter\?target\=ENSG00000157764 -k > data.json

47) inverse grep

grep -v

# courtesy Galeb Abu-Ali

48) seq command (sequence)

seq --format="0_%g.dat" 0 6 60

# will print

0_0.dat

0_6.dat

0_12.dat

0_18.dat

0_24.dat

0_30.dat

0_36.dat

0_42.dat

0_48.dat

0_54.dat

0_60.dat

49) Use paste and seq together to paste together a large number of files (courtesy stackoverflow)

paste $(seq --format="temp_sorted_%g.txt" 3 355) > combined_pasted_sorted.txt

50) xargs

51) Output tab delimited file (using echo)

iCount=1

for file_name in ls pred*.txt

do

#echo $file_name

cat "$file_name" | tail -n+2 | sort -dk2 | cut -f2 > "temp_sorted_$iCount.txt"

#echo "temp_sorted_$iCount.txt"

# output mapping file

echo "$file_name\ttemp_sorted_$iCount.txt" >> predcompounds_tempsorted.txt

iCount=$(( iCount+1 ))

done

52) Great compilation of common UNIX commands used by a data scientist (link mazamascience.com)

53) String comparison in the shell

# Note: have to use single quotes: 'tex' and not "tex"

if [ $file_extension = 'tex' ]

then

echo “good”

else

echo "$0: Error, incorrect file extension (expects .tex file)"

exit 1

fi

54) awk command to find file extension and test it

> echo test.tex | awk -F . '{print $NF}'

tex

# -F tells where to split and NF is number of fields in current line

# Here $NF will be 2, so print $2 would print tex

# Full usage in program

file_extension=`echo $1 | awk -F . '{print $NF}' `

if [ $file_extension = 'tex' ]

then

echo 'File extension correct'

else

echo "$0: Error, incorrect file extension (expects .tex file)"

exit 1

fi

55) awk command to select from a database

# From "Unix concepts and applications: Soumitabha Das"

awk -F"|" '/sales/ { print $2, $3, $5 } ' file.txt

# selects all rows with sales and the 2nd, 3rd and 5th column

56) uniq command (inspired by Unix concepts and applications: Soumitabha Das)

# select count(*), field_name from table; group by field_name

cut -d ',' -f 1 combined_hlty_prism_shotgun.csv | sort | uniq -c

57) Find name of file only (without file extension) using basename

full_name="paper.tex"

file_extension=".tex"

basename $full_name $file_extension

> paper

58) Compare output of one command to a variable/string/number (using command substitution)

if [ ` echo $1 | awk -F . '{print $NF}' ` = '.tex' ]

then

echo 'Correct format'

else

echo 'Incorrect format'

fi

59) Command tee (splits input into two components: one to file and another to standard output)

# display number of users

who | wc -l

> 2

# this however does not display the users who sends output to standard output (in this case it gets piped to wc -l)

# if you want to display intermediate results on the terminal AND pipe them to another command, use tee and /dev/tty

# display users and the number of users

who | tee /dev/tty | wc -l

> user1

user2

2

# display users, display number of users and save it to file

who | tee list_users.txt | tee /dev/tty | wc -l >> list_users.txt

# even better

who | tee list_users.txt | tee /dev/tty | wc -l | tee /dev/tty >> list_users.txt

60) Difference between double quotes (") and single quotes (')

Double quotes allow special meta-characters like $ and ` to be evaluated by the shell

echo "The users online are `who` in $PATH"

> The users online are user1 user2 in /home/dir

# Contrast this with using single quotes

echo 'The users online are `who` in $PATH'

> The users online are `who` in $PATH

61) "Iterators" in shell

for temp_file_name in `ls`

do

echo $temp_file_name

end

62) (TRICK) You want to store say file size or some other output from a command in a variable. Say

file_size=`wc -c file.txt`

echo $file_size

> 1071 file.txt

# this would also store the name of the file in $file_size

# so instead of having wc take the filename as an argument give it the filename from standard input

file_size=`wc -c < file.txt`

echo $file_size

> 1071

# now it saves just the file size in the variable $file_size

# inspired by Unix concepts and applications: Sumitabha Das (Chapter 9, The Shell)

63) Display processes with their nice value

ps -o nice

64) Rename directory

mv old_name new_name

65) ls option to ignore some files

ls --ignore *.sh

# does not work on Mac OS X

66) Integer testing in the shell

if [ $iCount = 1 ]

then

echo “success”

else

echo “fail”

fi

67) Sleep/wait in shell

sleep 10 # sleep for 10 seconds

68) Splitting a string in shell based on a delimiter (adapted from stackoverflow) CONCEPT - parameter expansion

# Extract the 4 digit number before the underscore (_)

prev_filename=‘1003_signalp_pos’

str_partition_array=(${prev_filename//_/ })

echo ${str_partition_array[0]} # access like an array 0th element is the 4 digit number

69) Echo tab in shell script (-e option) (inspired by stackoverflow)

echo -e 'Partition_number\tDthreshold_neg\tSignalp_neg\tDthreshold_pos\tSignalp_pos\tSignalp_output_FINAL' > FINAL_parsed_file_secreted

70) elif command

if [ $iCount = 1 ]

then

archaea_filename=$file_name

elif [ $iCount = 2 ]

then

negative_filename=$file_name

fi

71) A great collection of Unix commands for bioinformatics (link) (pdf)

72) View hidden files

ls -al

# can view for example .bashrc file

73) Redirect to null device

> /dev/null

74) Remove non-empty sub-directory

rm -rf <directoryname>

OR

rm -r <directoryname>

75) Command to reboot

sudo reboot

76) Universal install script (xkcd)

77) Find operating system

uname

78) temporary directory

/scratch (no write access for user)

/tmp

/var/tmp (has write access for user)

79) grep to search for pattern in file

grep -Ein '<= SB' combined_protein_sequence.fsa.myout

see link for more grep options

80) Compress files

gzip combined_protein_sequence.fsa

81) Tracking memory and disk usage

du -h

df -h

82) Kill process

kill -9 <processid>

83) Path where program or application is installed

which perl

/usr/bin/perl

84) find command in UNIX (more on link)

# Find in root

find / -name call_analyze_mcmc_v122allcall4_pnpi.m -type f -print

# Find in specific directories

find /media/banerjees/SAMSUNG/ -name call_analyze_mcmc_v122allcall4_pnpi.m -type f -print

85) Select a range of lines from a file (sed, stackoverflow)

sed -n '2,50p' command_file.txt > command_file_1.txt

86) Generate ssh key and clone repo

man ssh-keygen

ssh-keygen -b 2048 -t rsa

less ~/.ssh/id_rsa.pub

git clone git@github.com:xlab/tenx.git

87) Create a symbolic link (link)

ln -s <file_path_to_simlinkto> <new_name(optional)>

# will simlink to current directory

88) UNIX commands for bioinformatics (from link)

grep -­‐C2 -­‐Ein 'error|exception|warning' log.txt

find . -­‐type f -­‐not -­‐name "Trinity.*" -­‐print | xargs -­‐I{} rm -­‐vfr {}

gunzip -­‐c myreads.fastq.gz | head

89) How to add a program to PATH variable

If MATLAB is installed on ~/MATLAB/bin

then at UNIX command prompt

export PATH=$PATH:~/MATLAB/bin


90) How to append to PATH (link)


PATH=$PATH:~/soumya/local


91) Global replace in vi

:%s/foo/bar/g

to replace all instances of foo with bar globally in a document using vi