unix

Introduction

The terminal

Make it comfortable to work in: 

File organization

In UNIX, everything is organized in a hierarchy. Where we are and where we want to go is all organized in paths, which use the top to bottom hierarchy transformed into left to right, separated by the symbol /

In the below image, if we want to get the path to the Genomics folder within our user folder (Merce) we would get it by typing /home/Merce/Genomics.

Most of the time you spend using UNIX you will be within the folders of your /home/user folder, but it is important to understand the organization of the system in order to understand how to run things.

For example, why do we have three bin folders in the below representation?

All these folders contain executable files. 

/bin contains commands that can be used by anyone, and that can be used even when the filesystem is not in place, as in repair modes or when booting. It also contains most of the basic commands we will see at the begining of our UNIX workshop.

/usr/bin contains all the applications or executables that are accessible by any of the locally logged users. The executables here are not required for the functioning of the system.

/home/Merce/bin is for scripts, code and a selection of programs that are used individually and not as often. Scripts and programs can be stored in any location, and we then use the entire path to them to execute them. 

Absolute vs Relative paths

An absolute path refers to the entire path to one location, for example: /home/Merce/Genomics. But we might be in a different location from the root already, and we might want to avoid writing the full path again and again. For that we use the relative paths. A key aspect to relative paths is the dot symbol .

. refers to our current location

.. refers to the location above us

For example, if we are in Genomics already, that is our ./

If we want to refer to the Merce folder from our Genomics location, the relative path would be ../

Imagine we are in the Genomics folder, and we want to go to the bin folder inside Merce folder. We can just type ../bin to go there, instead of the absolute path /home/Merce/bin.

Are you typing? You are doing it wrong! Use the tab on your keyboard to autocomplete. 

Tab once to complete uniquely

Tab twice to see all possible completions

Arrow up scroll through previous commands

File system navigation

Where am I?

The following command will let you know the absolute path of your location:

pwd

Changing locations

Moving from one directory to another can be done by using the following command:

cd /path/to/directory

The command cd stands for change directory, and will allow us to move around the file system.

Notice that relative paths can be used to move to different directories, relative to the point in which we are located at the moment: 

The directory is within the location in which we are in .:

cd ./directory

which is the same as:

cd .

 The directory is one level above the location in which we are in is ..:

cd ../directory

If we want to go back to the parent directory of our current directory, we just need to type:

cd ..

Home, Sweet home

The command cd can always easily take us back home (/home/Merce) with either option:

cd

cd ~

Trick! - If we want to go back to the previous working directory where we were before the last cd, we type:

cd -

man command

All commands have a wide arrange of options to be used, listed in a manual.

We can print the manual in our screen using the command man.

Options can be specified as "flags" after the command:

command -flag1 -flag2

File system visualization

What is inside this folder?

You can take a look at the folder contents in which you are located with the following command:

ls 

ls will show you the contents the directory you are in.

Alternatively, one could have typed ls . or ls ./ 

If we want to take a look at the contents of the a different folder:

ls path/to/folder

Or use relative paths to, for example, look at the contents of the parent directory of our location:

ls ..

The command ls has several useful options, you can read about them in the manual:

man ls

The manual of ls will be printed in your screen. Now we know that using different flags after ls, one can get different types of information.

For example, if we want a formatted list:

ls -l

Or if we want to view the hidden files in the folder:

ls -a

Sometimes we can combine flags, such as:

ls -lh

which lists all files in a formatted list, using human readable numbers for the file sizes.

Key shortcuts

Ctrl + C halts current command

Ctrl + Shift + C copy (linux) - Cmd + C (mac)

Ctrl + Shift + V paste (linux) - Cmd + V (mac)

Ctrl + W erases one word in current line

Ctrl + U erases whole line

Ctrl + A go to begining of line

Ctrl + E go to end of line

Type exit log out of current session

Create, copy, move, and remove files and folders

New directory

To create a new directory, we type the following command:

mkdir new_directory

new_directory is the name that we specified for the new directory to have, once created.

Notice that we add an underscore symbol as a space. Try to avoid spaces in both filenames and folders, as some programs will not be able to interpret it correctly as one filename. You can choose the alternative that suits you better: newdirectory, NewDirectory, etc.

Copy files

We can copy a file with the following command:

cp file file2

The two files have the same contents, within the same folder, but different names. Now we can modify the contents of one of them without damaging the original one.

We can also copy a file to a specified directory:

cp file new_directory/

Now our file is duplicated, both in the current location and in new_directory. If we do not specify a name, we will have another file named file in new_directory. To specify a name change:

cp file new_directory/file2

Copy directories

We can also copy entire directories, but if you try typing cp new_directory new_directory2, you will probably get an error message like this:

cp: new_directory is a directory (not copied).

To copy recursively directories, and everything that they contain, we should use the flag -r 

cp -r new_directory new_directory2

Move files and directories:

We can move a file into a new directory with the following command:

mv file new_directory/

In this case, file will only be located in new_directory, and will be gone from the original location.

We can move a directory into a different parent directory with the following command:

mv directory path/to/other_directory/

More than just a move!

Move is not only used for moving, but also for renaming files and directories!

mv file file2

The file file is kept in the same directory, but it changes its name to file2. We can use the same command to rename directories without changing their location:

mv new_directory new_directory2

Remove files and directories

To remove a file, we use the following command:

rm file

To remove a directory we need to use again the -r flag, to avoid an error message:

rm -r new_directory

Notice that the rm command will not print any caution message like: "are you sure you want to do this?", it will directly remove the file that you specify

Always work with caution before running your commands!

"Unix was not designed to stop its users from doing stupid things, as that would also stop them from doing clever things."  - Doug Gwyn

Symbolic links

We work with large amounts of data and often with very large files, and we definitely do not want to have duplicated files occupying a lot of space in our computer system. At the same time, we do not want files in far-away folders from the place where we are currently doing our analyses, as we then have to use longer paths for each command/analysis we do on them. 

A good solution for that is to create symbolic links (also named symlinks or soft links). A symlink is a file whose purpose is to point to a file or directory by specifying a path thereto. That "file" is technically just a link to the original file path, so it does not occupy any extra space in your system.  

To create a symbolic link:

ln -s /path/to/storing_directory/file /path/to/working_directory/

If we were already inside working_directory, we can just type:

ln -s /path/to/storing_directory/file .  (Creates a symbolic link of file in your current directory) 

To check that it work, we can run the ls command with the -l flag:

ls -l path/to/working_directory or just ls -l

The -> symbol after the file name shows the path the symlink points to.  

Inputs and outputs

stdin It stands for standard input, and is used for taking text as an input.

stdout It stands for standard output, and is used to text output of any command you type in the terminal, and then that output is stored in the stdout stream.

stderr It stands for standard error. It is invoked whenever a command faces an error, then that error message gets stored in this data stream.

Explore file content

wc

We can get quantitative information on the content of a file with the command wc (word count).

To know how many lines a file has:

wc -l file

To know how many words a file has:

wc -w file

To know how many characters a file has:

wc -c file


less

To visualize the full content of a file we can use the command less. less is a read-only program, that allow us to read the contents of a file, without making any changes to it:

less file

less is a more complete version of the command more, with less allowing a wider arrange of possibilities to navigate through the file. A trick to remember it: "less is more". For the whole array of options in less, check the wikipedia page or its manual in the command line.

To quit less, we just press q.


cat

Another command that will show us the complete contents of the file is cat. In this case, it will be printed as standard output directly in our terminal. 

cat file

Short files are ok to be visualized directly as stdout, but others will flood our terminal with all its lines. 

Later on we will see how this command can be very useful to redirect file contents to other commands! :) 


head and tail

When working with larger files, sometimes we want to just quickly check its contents, instead of opening an entire huge file. For that, we have specific commands to check the begining or the end of the file:

head file

tail file

The default of head will print the first 10 lines of a given file, while tail will print the last 10 lines. But of course, we can modify the default number of lines using the flag -n <number>.

The character | (pipe) is used to concatenate commands, so that we can run one command after the other, avoiding the creation of intermediate files.

command1 input | command2 > output

Instead of :

command1 input > output1

command2 output1 > output2

Using pipe, the output of running command1 on a given input gets directly piped into command2, and we obtain an output of these two consecutive commands, generating only one output. 

A bit more advanced file-handling commands

Concatenate files

We can concatenate files by using the command cat command.

cat fileA fileB >> fileC

In this case, we are redirecting the standard output from the command cat (which is reading and printing fileA and fileB) into a new file, named fileC. 

The above command could be also phrased as:

cat fileA > fileC

cat fileB >> fileC

The difference between using the different output symbols is an important feature to always take into account. 

> One redirect symbol alone will rewrite anything that was previously on fileC. Always use > with caution!

>> The two redirect symbols will concatenate the output right after the contents that are already on fileC.


Sort file contents

When having a series of lines in our file that we want to put on a certain order, we can use the command sort.

sort fileA

Additional options that can be very useful at times:

sort -r fileA sorts in reverse order

sort -n fileA sorts lines in fileA numerically

sort -k 2 fileA sort fileA by column 2

We can put everything together:

sort -k 2nr fileA sort fileA by column 2, numerically and in reverse order

You will soon realize that sort will fail in sorting nummerically when numbers are not written with a 0 in front, such as 1, 2, 3,..., 10,11,12... instead of 01, 02, 03, ..., 10, 11, 12... The flag -V will allow you to order them naturally!

sort -V fileA

sort -u fileA sort lines of fileA and remove duplicates

The last command corresponds to the same action of piping the sort command into the uniq command

sort fileA | uniq


Are these two files different?

When we have been modifying files, it is good to check in which things they differ, so that we can recall which changes were made. The command diff can tell us if there are differences between two files. The flag -q will quickly check if there is differences:

diff -q fileA fileC

If they are different, in will print: Files fileA and fileC differ

If they are the same, it won't print anything.

If we want to see the specific changes, we would run the command without a flag:

diff fileA fileC


Splitting a file

The command split can let us split a given file into multiple files. The default split is 1000 lines per file, but we can create our own split by typing:

split -l 20 fileA

This command will produce x number of files from fileA, each containing 20 lines.


Another way to extract specific parts of a file is by using the command cut. This command can be used, for example, to extract specific columns from a file:

cut -c 2 fileA

Text editors

There is also ways to edit your file directly inside a text editor. There is a range of options, and people's preferences vary a lot depending on what they are used to. 

nano

The simpler option is nano, you start it by simply typing:

nano file

All commands within the nano text editor are given by pressing the Control-key, usually represented as ^. The most important commands are written here, but you can find the whole arrange in the nano webpage.

^S save current file

^O save to (a different file)

^X exit from nano


vim

The text editor Vim is a highly configurable text editor built to make creating and changing any kind of text very efficient. You can start it by typing:

vim file

Some of the most important commands are listed below, but you can find all the possible combinations in different online cheatsheets:

i start insert mode (you can start typing after where your cursor is)

ESC exits insert mode (also Ctrl + C)

:w save file without exiting

:q exit file (if there are unsaved changes, it fails)

:wq save and exit

:q! exit without saving changes


emacs

The text editor emacs is also characterized by its extensibility and configurability. You can start emacs by typing:

emacs file

Some esential commands commands get activated by typing Control + X, then the command (while holding the control key), but there is a wide range of key combinations to be used to move and edit the text:

Ctrl + x + s save file

Ctrl + x  + c exit editor (if not saved, it ask if you want to save, then type "yes")


Interact with your neighbor!

Find someone in the room that uses a different text editor than you, and try to explain to each other some of your most useful commands you use! If you still have not picked your favourite text editor, get to learn from one of them from someone around you!

Some bioinformaticians feel very strongly about their own "best text editor", and you will sometimes encounter funny comics like the one below. In the end, use whatever feels more comfortable to you!