cut, paste and translate

Cut

The "cut" command as it's name implies allows us to obtain parts of the input string. It has various options that can be used as an example to grab characters or fields from an input string.

$ echo "Testing" | cut -c 1-3

Tes

ajay.mittal@4RK5WZ1 ~/cs160a

$ echo "Testing" | cut -c 2

The "-c" is taking out characters at the positions specified in the next argument. The "1-3" means take out characters from 1 to 3 positions in the input string with the index starting at 1 . In the second example we have the command:

echo "Testing" | cut -c 2

We are stating that we want to "cut" the character at the 2nd position.

echo "field1 field2 field3" | cut -f1

We have a string containing words that are separated by tabs. The cut with the "-f" option is able to extract fields. The default separator is a tab. In the above we get the output:

$ ./f1.sh

field1

The first field is printed out.

echo "field1 field2 field3" | cut -f1-2

Output:

$ ./f2.sh

field1 field2

The 2 fields are printed out as directed by the "-f1-2" command.

echo "field1:field2:field3" | cut -f1-2

Output:

field1:field2:field3

In the above we have 3 fields that are separated by a ":" instead of a tab. The normal cut command does not work because it looks for the tab character and considers the whole string "field1:field2:field3" as one field . We have to tell the "cut" command that the delimiter character is ":" instead of the default character tab.

echo "field1:field2:field3" | cut -f1-2 -d:

Output:

$ ./f3.sh

field1:field2

In the below example we want to print only the first and the third field and skip the second field.

echo "field1:field2:field3" | cut -f1,3 -d:

We can also place quotes around the ":" .

echo "field1:field2:field3" | cut -f1,3 -d":"

Output:

field1:field3

Exercise

echo "field1@field2@field3@field4"

The above string has a "@" as the delimiter . Print out the 1, 3 and 4th fields skipping the 2.

Solution

echo "field1@field2@field3@field4" | cut -f1,3-4 -d"@"

Paste

The unix utility "paste" is able to combine 2 text files into one by taking the line from each file and joining the lines into one big line and creating an output that has all the information from the 2 files.

File: "names.txt"

Mark Smith

Bobby Brown

Sue Miller

Jenny Igotit

File: "numbers.txt"

555-1234

555-9876

555-6743

867-5309

$ paste names.txt numbers.txt

Mark Smith 555-1234

Bobby Brown 555-9876

Sue Miller 555-6743

Jenny Igotit 867-5309

The first line was taken from the file "names.txt" and then the first line was taken from the file "numbers.txt" . The 2 lines were combined to create the line:

Mark Smith 555-1234

The text of the lines is separated by a tab. We can change the separator by using the "-d" option.

paste -d " " names.txt numbers.txt

Mark Smith 555-1234

Bobby Brown 555-9876

Sue Miller 555-6743

Jenny Igotit 867-5309

In the above we are stating that the delimiter should be a single space instead of the default separator ( tab ) .

$ paste -s names.txt numbers.txt

Mark Smith Bobby Brown Sue Miller Jenny Igotit

555-1234 555-9876 555-6743 867-5309

In the above we have use the "-s" ( serial ) option . Instead of taking a line from a text we have the whole text of the first file written out to a line and then the text of the second file written out to the next line.

What about the case where one file contains the number of lines and that number is different from the second file. The content is matched up for the line numbers and the for the file that contains more lines those lines are written out by themselves.

File: "names1.txt"

Mark Smith

Bobby Brown

Sue Miller

File: "numbers.txt"

555-1234

555-9876

555-6743

867-5309

The file "names1.txt" has only 3 lines while the "numbers.txt" has 4 lines .

$ paste names1.txt numbers.txt

Mark Smith 555-1234

Bobby Brown 555-9876

Sue Miller 555-6743

867-5309

The last line does not have any line from "names1.txt" but the number from the file "numbers.txt" .

Translate

The "tr" ( short for translate ) command is able to replace and remove character in the input string.

echo "cat" | tr "abcd" "mnop"

Output:

omt

We have 2 sets of characters of equal length "abcd" and "mnop" . The "tr" command is stating that in the input string if we find a character "a" then replace it by "m" and if we find "b" then replace it by "n" .

The "c" character is replaced by "o" and the "a" is replaced by "m" .

We can also use ranges instead of typing all the characters in the string.

echo "cat" | tr "a-d" "m-p"

Output:

omt

Ex:

echo "cat" | tr "a-z" "A-Z"

Output:

CAT

In the above we have the lower case letters substituted by upper case letters. The string "cat" is converted to "CAT" .

The "-s" command ( squeeze ) can be used to take out repeating characters.

echo "caaaaaaaaaaaat" | tr -s "a"

Output:

cat

In the above we have specified that "tr" should remove duplicate "a"'s .

echo "caaaaaaaaaaaata" | tr -s "a"

Output:

cata

Notice that the "a" at the end still gets printed. The "-s" only applies to characters that are right next to each other.

echo "caaaaaaaaaaaata" | tr -d "a"

Output:

The "-d" option( delete ) will remove all occurrences of the character following it.

$ echo "caaaaaaaaaaaata" | tr -d "ca"

We can specify more than one character in the set of characters to be deleted.

The "c" flag states the complement and the "tr" command works on the characters that are not in the set.

$ echo "c?@#ta" | tr -cd "[:alnum:]"

cta

The range "[alnum:]" states that the set contains all alphanumeric characters . The "c" means complement so what we are stating is that remove all the characters from the input string that are not alphanumeric.

The "tr" command does not take input as files .

File: "data.txt"

caaaaaaaaaaaata

We cannot simply do

tr -d "ca" data.txt

We have to "cat" the file and supply it as an input to the "tr" command.

$ cat data.txt | tr -d "ca"

We can use "tr" to implement the Caesar cipher which relies on substitution of characters. If we use a character shift of 1 then .

Plain Text: a b c ..... z

Converted: z a b y

$ echo "can do" | tr "a-z" "za-y"

bzm cn

To get back the original plain text .

$ echo "bzm cn" | tr "za-y" "a-z"

can do

Similarly if we want to do a shift of 3 .

$ echo "can dz" | tr "a-z" "x-za-w"

zxk aw

$ echo "zxk aw" | tr "x-za-w" "a-z"

can dz

Exercise

We have a number . The requirement is for you to write a conversion so that the digits are incremented. The "0" will become a "1" and the "9" will become "0" . So "134" will become "245" and "189" will become "290" .

Solutions

ajay.mittal@4RK5WZ1 ~/cs160a

$ echo "134" | tr "0123456789" "1234567890"

245

ajay.mittal@4RK5WZ1 ~/cs160a

$ echo "189" | tr "0123456789" "1234567890"

290