Protein XRD Protocols - Bare-Bones Linux

Bare-Bones Linux

Roger S. Rowlett

Gordon & Dorothy Kline Professor, Emeritus

Colgate University Department of Chemistry

Because so many X-ray crystallography data analysis and protein modeling and refinement programs are written for the Unix/Linux platform, some familiarity with the Linux operating system is desirable to do protein crystallography. Most protein crystallography programs have been ported to Windows and IOS, but in some cases not all of the program features are available in these operating systems. The following information is not intended to be an exhaustive Linux tutorial; rather, it is intended to be the bare minimum information required for starting, ending and navigating a Linux session using a workstation in the Colgate University Department of Chemistry Protein X-ray Crystallography Computing Facility. Some helpful tricks and tips will be given along the way. You are encouraged to consult the additional resources for more information.

Starting and Ending a Linux Session

Starting a Linux session

To start a Linux session, type in your username and password at the welcome screen. Do not share your password with anyone, else others could potentially make mischief, either intentionally or unintentionally, in your work area. The system administrator will set up an account for you, and help you configure your desktop, which will look something like the following figure.

Figure 1. Typical Linux session (Ubuntu ) on a workstation in the Colgate Protein X-ray Crystallography Computing Facility. The Firefox browser, Pymol, CCP4i, and a terminal window are open on the desktop in this figure.

Starting a Linux Shell

Most crystallography programs and utilities run from a shell window, which is basically a text window into which you can type Linux commands. There are many shell environments in Linux, but the preferred one for crystallographic software is tcsh, an enhanced variant of the C-shell. To open a new shell in Linux click on the terminal icon in the toolbar (7th from the left in Figure 1). A new window should open with a command prompt such as ancho%. In this case, ancho indicates the computer to which you are logged in, and % is the shell prompt. Commands that you type will appear after the prompt symbol. Shell windows can be closed by typing exit at the prompt or by clicking the upper right corner of the window.

Ending a Linux session

To end a Linux session (not to be confused with exit from a shell window), right-click on the desktop and select Logout. Always log out of your session when you are away from your terminal for more than a few minutes.

File Storage Structure in Unix/Linux

File directory structures are similar to that of DOS (the precursor to Windows). Indeed, DOS (now called the Command Prompt in Windows) is a derivative of Unix and shares many common commands and functions. When you start a Linux session, you will be located in your home directory, and all commands you type will normally apply to the files in this, your local directory, unless instructed otherwise. Your local directory might be something like /home/jdoe. That is, unless instructed otherwise, all files will be read and written to the jdoe directory of the home directory of the machine you are logged in. You can find out where you currently are by typing the pwd command; you can make a new directory in the current directory with the mkdir command; or you can change your local directory to another with the cd command. These and other commands are described below.

The string /home/jdoe/filename describes an absolute path to the file filename,a complete set of instructions to locate the file in question. The leading slash indicates that this is a complete path, starting with the directory home. The string datafiles/filename is a relative path which describes how to locate a file from the local directory. A relative path does not start with a leading slash. For example if you were currently in the directory /home/jdoe, the relative path datafiles/filename would point to the absolute location /home/jdoe/datafiles/filename. Relative paths can save a lot of time when typing commands.

It is important for new Linux users to know that Linux will not generally protect you from yourself. For example, deleting files or directories in Linux is absolutely, positively, no-turning-back, irretrievably FINAL. You cannot recover files you delete accidentally. Therefore, proceed with care and caution when cleaning up data. A list of commonly used Linux commands follows in the next section.

File Storage Structure in Unix/Linux

File directory structures are similar to that of DOS (the precursor to Windows). Indeed, DOS (now called the Command Prompt in Windows) is a derivative of Unix and shares many common commands and functions. When you start a Linux session, you will be located in your home directory, and all commands you type will normally apply to the files in this, your local directory, unless instructed otherwise. Your local directory might be something like /home/jdoe. That is, unless instructed otherwise, all files will be read and written to the jdoe directory of the home directory of the machine you are logged in. You can find out where you currently are by typing the pwd command; you can make a new directory in the current directory with the mkdir command; or you can change your local directory to another with the cd command. These and other commands are described below.

The string /home/jdoe/filename describes an absolute path to the file filename,a complete set of instructions to locate the file in question. The leading slash indicates that this is a complete path, starting with the directory home. The string datafiles/filename is a relative path which describes how to locate a file from the local directory. A relative path does not start with a leading slash. For example if you were currently in the directory /home/jdoe, the relative path datafiles/filename would point to the absolute location /home/jdoe/datafiles/filename. Relative paths can save a lot of time when typing commands.

It is important for new Linux users to know that Linux will not generally protect you from yourself. For example, deleting files or directories in Linux is absolutely, positively, no-turning-back, irretrievably FINAL. You cannot recover files you delete accidentally. Therefore, proceed with care and caution when cleaning up data. A list of commonly used Linux commands follows in the next section.

File Storage Structure in Unix/Linux

File directory structures are similar to that of DOS (the precursor to Windows). Indeed, DOS (now called the Command Prompt in Windows) is a derivative of Unix and shares many common commands and functions. When you start a Linux session, you will be located in your home directory, and all commands you type will normally apply to the files in this, your local directory, unless instructed otherwise. Your local directory might be something like /home/jdoe. That is, unless instructed otherwise, all files will be read and written to the jdoe directory of the home directory of the machine you are logged in. You can find out where you currently are by typing the pwd command; you can make a new directory in the current directory with the mkdir command; or you can change your local directory to another with the cd command. These and other commands are described below.

The string /home/jdoe/filename describes an absolute path to the file filename,a complete set of instructions to locate the file in question. The leading slash indicates that this is a complete path, starting with the directory home. The string datafiles/filename is a relative path which describes how to locate a file from the local directory. A relative path does not start with a leading slash. For example if you were currently in the directory /home/jdoe, the relative path datafiles/filename would point to the absolute location /home/jdoe/datafiles/filename. Relative paths can save a lot of time when typing commands.

It is important for new Linux users to know that Linux will not generally protect you from yourself. For example, deleting files or directories in Linux is absolutely, positively, no-turning-back, irretrievably FINAL. You cannot recover files you delete accidentally. Therefore, proceed with care and caution when cleaning up data. A list of commonly used Linux commands follows in the next section.

Commonly Used Linux Commands

The following is an alphabetical list of a common Unix commands that you might use for routine crystallography work and file maintenance. Please note that Linux commands, unlike DOS commands, are case-sensitive. So PWD is not the same as Pwd as pwd. Filenames are also case-sensitive; most users avoid using capitalized text in filenames for ease of typing and to prevent confusion.

cd directory—change your local directory to a new location. If you issue cd with no argument, it will take you to your home directory.
chmod permission filename—change permissions for files. You must provide a filename and one or more permission arguments with this command. The permission argument includes an optional group (user, group, others, all) and a permission (execute, read, write) For example to allow a file to be executable, you would type chmod +x filename. To make a file readable and executable by all users, but not writable, you would type chmod a+rx-w filename. This command is most often used to make scripts you write executable, which is not the default. For example chmod +x filename would make a file executable for the user, in addition to whatever permissions it already had.
cp filename1 filename2—copies a file from one location to another. For example cp /home/jdoe/yourfile myfile would copy the file yourfile from the /home/jdoe directory into your local directory with the name myfile. Be careful with the cp command: it will not check to see if you are copying over a current file with the same name.
df—report free block of space on disk drive. To force display of free space in intelligible units (kB and MB), use df –h.
kill PID—halts a process with the indicated PID (process identification number). This command is used to halt a program running in the background. Sometimes kill is not adequate and a more severe variant kill –9 PID must be used. The kill –9 command should normally be used only as a last resort.
ls—list directory. This command will list the contents of the current directory. You may add switches for additional functionality. For example ls –l will make a “long” listing of files with additional file details, including size and permissions. On most Linux distributions, ll will carry out the ls –l command.
mkdir directory—creates a new subdirectory within the local directory.
less filename—displays the contents of a file a page at a time. Tapping space or f scrolls one page forward, b one page backward; g jumps to the beginning of the file; G jumps to the end of the file. Type q to quit.
mv filename1 filename2— moves a file from one location to another. For example mv /home/jdoe/yourfile myfile would move the file yourfile from the /home/jdoe directory into your local directory with the name myfile. Be careful with the mv command: it will not check to see if you are copying over a current file with the same name. The mv command is often used to rename a file in the local directory. For example mv oldname newname would rename the file oldname to newname.
kedit filename—KEdit is a very nice KDE text editor that can be used to edit files and scripts. If a filename is supplied it will open that file. Alternative editora, if installed, are gedit and editpad.
ps—identifies the current processes running and their process identification numbers (PID). This command is most often used to obtain PIDs for the kill command.
pwd—print working directory. This command returns the name of the directory you are currently located in.
rm filename—deletes a file from the local directory. A useful but very dangerous variant of remove is rm –r directory. The rm –r command is a recursive remove which deletes a directory and absolutely everything that is in it, including additional subdirectories and files within it. Use with extreme caution!
rmdir directory—removes a subdirectory from within the local directory. The subdirectory must be empty of files in order to remove it.
tail n filename—displays the last n lines of the file filename. A variant, tail –f filename will continuously follow the last lines of a file as it is being written. This command is useful for monitoring the progress of programs that write log files while executing. The tail –f command must be terminated with CTRL-c.

Special Linux Command Line Characters and Actions

Linux has many special characters that make it easier to type commands. Some of these are listed below, with examples.

The tilde (~) is used to designate your home directory. For example, if your home directory is home/jdoe, then the command ls ~/datafiles would list the contents of the home/jdoe/datafiles directory.
The dot (.) is used to designate the current local directory. For example, the command cp /home/jdoe/datafiles/filename . would copy the file filename from the /home/jdoe/datafiles directory into your current directory using the same name.
The double dot (..) is used to designate the directory one level up from your current directory. For example the command cp ../../datafiles/filename . would copy the file filename from the directory two levels up into your current directory.
The star (*) is a wildcard character that can be used to select many similar files. For example the command cp /home/jdoe/datafiles/*.osc . would copy all files in the /home/jdoe/datafiles directory ending with the characters .osc into your current directory. Be careful with wildcards, especially when removing files. You can always test your wildcard selection by doing an ls command first. If the ls command lists the files you thought you selected, you can change ls to rm and remove the correct files with confidence.
The question mark (?) is a wildcard character for a single character in a filename. For example, the command rm abcd? would remove from the current directory all files exactly five letters long starting with the letters abcd and any other fifth character.
Brackets ([]) are used to enclose ranges of characters allowable in a single character position for selected files. For example the command rm mydata.1 [0-5][0-9] .osc would remove files mydata.100 to mydata.159 if present in the current directory.
The ampersand (&) is used to instruct Linux to run a program in the background. In this way, you can continue to use the current Linux shell while your program runs. Background jobs will continue to run even if you log off the machine. For example the command myprog & would start the program myprog, display a PID, and return the shell prompt. The program will run in the background until it finishes or is terminated with the kill command.
The up-arrow key (↑) will display the last command typed if your environment is set up appropriately. Repeatedly pressing ↑ will call up additional previous commands. Pressing the ↓ key will bring up successively more recent commands. Commands that are called up this way can be edited, using the → and ← keys to scroll across the line. To execute a command called up and/or edited this way, press enter.
The middle mouse button actually has an important use in Linux as a “paste” command. This is especially useful when editing command lines with long file names. You can select text in virtually any Linux application, including the terminal window, by holding down the left mouse button and dragging. To paste this text into the command line (or another Linux application) move the cursor to the insertion point and click the middle mouse button.

Input and Ouput Redirection

Linux allows the user to redirect information from the keyboard or screen (defaults for input and output) to files or even other programs using redirection commands. A listing of common redirection commands is given below with examples.

The left carat (<) is used to redirect input. For example, the command myprog<input.txt would launch the program myprog and accept input from the text file input.txt rather than the (default) keyboard. Running programs using input scripts rather than the keyboard is a very common way of executing programs in Linux.
The right carat (>) is used to redirect output. For example, the command myprog<input.txt>output.txt & would launch the program myprog in the background, accept input from the text file input.txt rather than the (default) keyboard, and output results to the file output.txt rather than the (default) screen. You could monitor the progress of myprog if desired by issuing the command tail –f output.txt.
The double right carat (>>) is like the right carat except that it will append, rather that overwrite, data to an output file. For example the command myprog<input.txt>>output.txt & would launch the program myprog in the background, accept input from the text file input.txt rather than the (default) keyboard, and append results to the file output.txt rather than the (default) screen. If output.txt does not already exist, it will be created.
The pipe (|) is used to feed the output of one program into another. For example, the command myprog<input.txt|tee output.txt would launch the program myprog, accept input from the file input.txt, and send the results to the program tee, which sends output to both the screen and the file output.txt. This example is another way to monitor the progress of an executing program while saving the output to a text file.

Customizing Your Linux Environment

It is possible to customize your Linux environment to make it easier to navigate through your directories and projects. To customize your environment, edit the .tcshrc file in your home directory. Commands in this directory will be executed each time you open a new shell window. The following types of commands are useful to have in your .tcshrc file:

set history = 100—this setting allows the last 100 commands to be remembered. You can call them up at the prompt by pressing the ↑ key as described previously.
alias name ‘command’—This command is used to designate a shortcut name for a complicated command. For example the command alias project10 ‘cd /projects/project10/refine/ncs/’ would allow you execute the long complicated directory change in single quotes by simply typing project10 at the prompt. Any valid Linux command can be placed within the single quotes.