Anything you can do on the shell can be placed in a bash (shell) script.
Shell scripts often have the suffix .sh
Shell scripts must be executable chmod 755 or chmod +x
Comments can be written in scripts with a #
Variables can be used to shorten long paths
Shell loops can be used to process lots of files
\ can be used to wrap long commands across multiple lines
#!/bin/bash must be the first line, it specifies interpreter
We can save variables under almost any name.
Variables can be string type:
evomics="Workhop_on_genomics_2024"
data="genome_assembly_file.fasta"
path="/home/genomics/workshop_materials/unix"
Integer type:
num=5
or float type:
pi=3.14
We can refer to the variables using a dollar sign:
$evomics
${evomics}
Loop over files inside a directory:
for file in ./unix/working_directory/*fastq
do
commands $file
done
Loop over files that we stored inside a variable:
files="file1
file2
file3
file4"
for file in $files
do
commands $file
done
while read line
do
command $line
done
We can pipe the command ls -l to this script to run the command on each of the files listed.
If we want to print messages to the standard output while the script is running we can do that using the echo command. This is specially useful when running a long pipeline of multiple commands, so that we can keep track of the stage that is currently running.
for file in ./unix/working_directory/*fastq
do
echo "Command 1 is running on $file"
command1 $file
echo "Command 2 is running on $file"
command2 $file
done
Sometimes when we run a for loop on multiple files, we want to avoid carrying all the file format tags in each output file.
for file in ./unix/working_directory/*fastq
do
file_name=$(basename $file ".fastq")
command1 $file -out ${file_name}_command1.fastq
done
It is the same as:
for file in ./unix/working_directory/*fastq
do
file_name=$(basename $file ".fastq")
command1 $file -out "$file_name"_command1.fastq
done
When using the variable $file_name, we need to separate what is the variable and what is the string of the output name we want to give.
Just by writing: $file_name_command1.fastq, we would be trying to call for a variable name that does not exist.
A good method is to always put variables in our code within curly brackets after the dollar sign: ${file_name}
Alternativelly, we have to remember to add the quotation around the variable when we need to use it in combination with other text.