Unix and Modeling - 3. File paths and I/O

Tutorial 1) Part 3: File paths: Absolute vs. relative pathnames

Depending on your operating system (OS) version or settings, the command prompt usually shows the directory you are in. The prompt also shows the hostname of the computer you are logged into. For example, my Macbook terminal prompt is:

teton:~/Music/iTunes % (relative path)

Here, teton is the name of my computer and the string between the colon (:) and % sign is the current path of the directory I am inside. The prompt only shows the current path relative to my home directory (denoted ~). While the absolute pathname for my iTunes folder is:

/Users/rhills/Music/iTunes (absolute path)

The df unix command shows you the disk space and drives on your file system, and the mount points for each partition:

df -H display free disk space (Human-friendly output: byte/kilobyte, mega/gigabyte, tera/petabyte)

You can switch to a directory from anywhere on the computer using either the absolute path, or the full path relative to your current location. Below we will illustrate for day1..

cd / If you start at hard drive's top level (master root) folder (/)...

cd day1 day1 relative path is not found because the shell is only looking in the current directory (root level: /). Note: filename autocomplete only works for files in your current directory.

cd cd without an argument changes to your home directory

pwd prints absolute path of current working directory

cd day1 this relative path is equivalent to ./day1 (single dot = current directory)

echo 1space >! "z z.txt"

cat z z.txt why does this result in an error?

cat "z z.txt" quotes pass a single string argument to the cat command

See if your shell will "autocomplete" filenames for you... type cat z and hit the tab key--you should see the command line change to cat z\ z.txt. Hit the enter key and it will print the file. If you have more than one file beginning with 'z', hit tab twice and it should list the files. The backslash (\) is an escape character to denote spaces. Autocomplete is a nice feature that ensures you are supplying the right filename without spelling errors.

File completion (set filec) also expands directory names when searching folders. Try typing cd /home or cd /Users and hitting tab to find your home directory:

PC: cd /home/username/day1 type cd /home and hit tab

Mac: cd /Users/username/day1 type cd /Users and hit tab twice

pwd

Relative paths point to a file in or above your current directory. They are useful as arguments to Unix commands:

mkdir day1/part3

ls day1

ls day1/part3

Relative paths are more convenient for navigating quickly in the terminal, but you need to remember which directory you are in.

QUESTION 5: Write the commands using absolute and relative paths to change your current directory from your day1 folder to a standard OS folder (e.g. Desktop, Downloads, or Documents).

Part 4a. Sequencing Commands and Scripting

In scripting, one performs multiple commands in succession, some of which may take a long time. If the next step does not depend on the results of the first then we run them in parallel rather than serially, which is especially fast on today's multicore CPUs. But the Unix terminal and shell ordinarily make you wait for a command to finish executing before it will accept your next command. For example, sleep is a Unix program that won't return to your shell for the specified unit of time..

sleep 7 subsequent commands will not be processed until 7 seconds elapses

Waiting for a process to finish is referred to as executing a command in the foreground. The solution is to execute the command in the background using ampersand (&):

sleep 50 & Ampersand tells the sleep program to run in the background, returning you with a fresh Unix prompt to enter more commands into the same shell.

[1] 48143 output specifies this is your 1st job with process ID (PID) 48143

You are free to enter other commands while your job runs in the background:

jobs lists your running jobs (job 1)

[1] + Running sleep 50

ps lists PIDs of running processes or shells

27229 ttys002 0:00.08 -csh

43716 ttys003 0:00.09 -csh

48172 ttys003 0:00.00 sleep 50

The shell will return a message when your background job encounters an error or terminates normally. The following line will print to the screen either in real time or the next time you enter a command:

[1] Done sleep 50

(Then hit return once or twice to get a Unix prompt on a new line.)

sleep 300 If you forget an ampersand..

Control-C ^c is the Unix kill signal for foreground processes.

sleep 3000 & How do we kill a given background process?

[1] 48192

sleep 4000 &

[2] 48193

PID TTY TIME CMD

27228 ttys000 0:00.03 -csh

48192 ttys003 0:00.00 sleep 3000

48193 ttys003 0:00.00 sleep 4000

Note how jobs 1 and 2 are calculations running in parallel. For multiple processes on a single CPU, it alternates clock cycles for each process with equal priority.

kill %2 kill will usually terminate a process (job 2).

[2] Terminated sleep 4000

kill -9 <PID> the -9 flag using your PID is more forceful if the above doesn't work.

[1] Killed sleep 3000

On a Mac, top shows which process are currently using the most CPU or memory:

top what's at the top of your list?!

Control-C ^c when you're ready to exit to the shell prompt.

The kill command is equivalent to the Force Quit.. function on Mac. To see just how many background processes your OS uses, enter:

ps -ef

It is good practice to perform a complete Shutdown or Restart on your computer once a month so that any hanging processes get terminated and your working memory is cleared. This is different than simply closing the lid of your laptop, which just puts the OS to sleep and holds all your programs open in RAM memory.

______________________________________________________________________________

Part 4b. File I/O

A serial calculation can be made with a single command line by stringing subsequent commands together using semicolon (;) to perform them one after another:

sleep 2; abc; sleep 5; echo def & (your syntax may require a space after each semicolon)

The abc command does not exist. The fact that it takes 3 seconds before the error message is printed confirms that the shell performed the calculation in serial rather than in parallel.

Unix is popular with multi-user computer systems (remote client) because one can run in a job in the background, log out of the machine, and the job will still run to completion. In contrast, a process run in the foreground will cease the moment you close your terminal or log out of your account.

Scripting is writing a useful sequence of commands to automate operations on a set of input files. It is standard practice to write output into a file rather than printing to a terminal because you don't want random output popping up on the screen while working on other things. The greater than symbol (>) allows for standard output (stdout) redirection into a text file:

echo job done > job.out

cat job.out confirms the output of the echo command as written to a file

If your job encounters any errors they will be sent as standard error (stderr) rather than stdout, meaning errors normally get printed to the screen rather than your output file. If you want the errors stored in your file, the syntax to redirect stdout and stderr to a file the syntax is >&:

abc >& err.out

What if we need to rerun a job?

echo job redone > job.out

job.out: File exists.

cat job.out

The shell aborts the command and complains because > redirection expects to create a new file. There are two ways around this. One is to tell the shell to force the overwrite of the existing file using >!:

echo REDONE >! job.out

cat job.out

The last alternative is to append the output (>>) to whatever lines are already contained in the file:

echo thrice >> job.out

cat job.out

It is best practice to name separate input and output files each time you run a program:

echo run2.inp > run2.out

QUESTION 6: Of the output from the following command sequence, which lines are output into s.out by the time exactly 14 seconds has elapsed? Explain the meaning of each punctuation mark to the shell interpreter.

(sleep 10; date; sleep 3; date; sleep 3; date) > s.out & <enter>

tail -f s.out enter this immediately after the first command

Tail is like the cat command but follows data as it is appended to the file. Note that the parentheses above were needed to group the output from all commands into a single file. When you're done viewing all the output hit:

ctrl-c to exit the tail command and return to the Unix prompt use the universal kill signal (^c)

If the processes are independent, it is easy to write a script to run them in parallel. The first job will occupy the first processor or core of the central processing unit (CPU), the second job will go onto the second, processor, etc. Depending on the architecture, some or all processes will share the same random access memory (RAM) modules needed to complete the calculation. It is very difficult, however, to write a single program that makes efficient use of multiple processors if the data processing streams have to talk to each other. Two examples of parallel programming are graphics processing units (GPUs) and molecular dynamics simulation. Both are limited to a very specific computational task. In fact, most applications you run on your computer only access 1 CPU each--the processor is calculating a result and that result is needed before the next part of the calculation can proceed, so the other CPUs are kept waiting. The reason a multicore CPU is still better is that each job can run on a different CPU rather than filling up a single processor. Multiple jobs can also share the same processor. The CPU will take turns performing a single calculation, known as a clock cycle, from each job.

Let's try a computer-intensive task that is limited by the speed you can search your entire hard disk:

find / -name "zzzz" &imited by th

Find will search within all folders on your root folder (/) to see if any have a file named zzzz.

enter Hit return to ignore stderr messages if find encounters locked folders. The enter key should return you to the Unix prompt each time because the ampersand told the shell to run find in the background.

enter

You can monitor the cpu usage of the find process using top as above or ps:

ps -v the verbose flag is a common option for detailed output

ps u

When find finally finishes you will see the message:

[1] Exit 1 find / -name zzzz

On a Mac, open Applications > Utilities > Activity Monitor.app to show the system resources being used in terms of CPU usage and RAM memory. You'll see that Firefox is a more resource-intensive web browser than Chrome and will cause your fan to run hot and drain your battery quicker (shown to the right is my quad core/8-thread MacBook Pro). You can opt to permanently show your CPU history in your Dock.

If the find command still hasn't finished, enter:

jobs

kill %1 (If the prompt is not accepting commands, hit ^c to cancel find in the foreground.)

Next, we'll learn how to create and edit text files to finish our introduction to Unix.

Summary of basic Unix commands see [Command Reference] page

man displays help for specified command

cd changes working directory

ls lists files/directories

cp a b copies file a to a new file named b (the filename can also be a relative pathname)

mv a b renames file a to file b (giving a pathname causes the file to be moved to a new location)

rm remove: deletes file

rmdir deletes an empty directory

mkdir creates new directory

cat prints file(s) contents to the screen

echo prints a given character string to the screen

top updates you with CPU/memory usage info for active processes

kill kill process using its PID

Tutorial 1) Part 3: File paths: Absolute vs. relative pathnames

Part 4a. Sequencing Commands and Scripting

Part 4b. File I/O

Summary of basic Unix commands see [Command Reference] page

Proceed to: Part 5