Sed and Awk

Sed

Awk

Review

whoami | sed -r 's/[.]*/Thanks for sending some time with me today: /'

prints

Thanks for sending some time with me today: amittal

whereas

whoami | sed -r 's/.*/Thanks for sending some time with me today: /'

prints

whoami | sed -r 's/.*/Thanks for sending some time with me today: /'

Thanks for sending some time with me today:

Exercises

Ex 1

Write a sed expression to output the first word in a sentence. You can assume there are no spaces at the beginning of the sentence.

The space character can be represented by "[[:space]]" .

ex:

echo "This is a test for the sed utility." should output "This" .

Now assume that there are spaces before the first word.

echo " This is a test for the sed utility."

Ex 2

We have a sentence :

This is a test.

Write a sed expression to exchange the first and last word. So that the output should be:

test. is a This

You can assume that the sentence ends with a period.

Ex 3

Let's say we have a regular license plate and we have a requirement where a digit is incremented in the license plate. Assume this license plate has only the digits 0,1,2,3 . So the "1" will change to "2" and the "3" will change to "0". How we can write a script for that ?

The plate "3ELB021" will change to "0ELB132"

Ex 4

In the first homework we used "hostname -i" to obtain the ip address. The ip address is also output by the command "ifconfig" .

amittal@hills awk]$ ifconfig

ens192: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 147.144.1.2 netmask 255.255.255.0 broadcast 147.144.1.255

inet6 fe80::250:56ff:fe89:a42f prefixlen 64 scopeid 0x20<link>

ether 00:50:56:89:a4:2f txqueuelen 1000 (Ethernet)

RX packets 62039152 bytes 28099069870 (26.1 GiB)

RX errors 0 dropped 177685 overruns 0 frame 0

TX packets 39742749 bytes 35885006370 (33.4 GiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

The IP address is listed on the 2nd line. Use "sed" command to grab the second line and use "awk" to print the IP address out.

Ex 5

The following command:

echo "123 456 Some"

prints the string to the console. Pipeline the command to the awk command so that the first 2 numbers and their sum are printed.

Ex 6

What does the following print ?

echo "1234" | awk 'BEGIN { print "Testing for division by 2." }

{

if ( $1 % 2 == 0 )

{

print "Number is even."

}

else

{

print "Number is odd."

}

END { print "End of Test" }'

Ex 7

1/20/18 06:10:10.722 AM] [SEVERE] [com.on24.appgen.external_stream.stream_relay2] 11259039,17013,999,frame=29665 fps= 30 q=-1.0 size= 107578kB time=00:16:24.98 bitrate= 894.7kbits/s speed= 1x frame=29683 fps= 30 q=-1.0 size= 107638kB time=00:16:25.54 bitrate= 894.7kbits/s speed= 1x frame=29697 fps= 30 q=-1.0 size= 107694kB time=00:16:26.05 bitrate= 894.7kbits/s speed= 1x [NULL @ 0x3d64780] Current profile doesn't provide more RBSP data in PPS, skippingframe=29711 fps= 30 q=-1.0 size= 107756kB time=00:16:26.52 bitrate= 894.8kbits/s speed= 1x

11/20/18 06:10:10.815 AM] [SEVERE] [com.on24.appgen.external_stream.stream_relay2] 11259009,8893,999,frame=61108 fps= 30 q=-1.0 size= 221888kB time=00:33:53.48 bitrate= 893.9kbits/s speed= 1x frame=61124 fps= 30 q=-1.0 size= 221966kB time=00:33:54.07 bitrate= 893.9kbits/s speed= 1x frame=61140 fps= 30 q=-1.0 size= 221984kB time=00:33:54.58 bitrate= 893.8kbits/s speed= 1x [NULL @ 0x26fe780] Current profile doesn't provide more RBSP data in PPS, skippingframe=61150 fps= 30 q=-1.0 size= 222011kB time=00:33:54.90 bitrate= 893.8kbits/s speed= 1x

The problem is to take out the line

[com.on24.appgen.external_stream.stream_relay2] from the log. The class name could be something else but they all start with the package name "com" .

Ex 8

We have a "data.txt" file containing the following:

The cat cat eats her food.

The dog chases the the cat.

Milli Vanilli can lip lip sync.

Girl you know it's true true .

Write an awk script that will take out the duplicate words using first a sed script and then an awk script. For the awk script use a loop that will go through each word in a line and compare 2 words and skip a word if it is a duplicate.

Write an awk script using associative arrays that will take out duplicate words even if they are not close together and if there are more than 2 duplicate words in the same line. Example of "data1.txt":

The cat cat eats her cat food.

The dog chases the the the cat.

Milli Vanilli can lip lip sync lip lip.

Girl you know it's true true .

Ex 9

Given a text file write an awk script that prints the word and how many times it occurred.

Solutions

Soln 1

echo "This is a test for the sed utility." | sed -r s/[[:space:]]+.*//

echo " This is a test for the sed utility." | sed -r 's/^[[:space:]]//' | sed -r s/[[:space:]]+.*//

Soln 2

echo "This is a test." | sed -r 's/([A-Za-z]+)(.*)([[:space:]][a-zA-Z]+.$)/\3\2 \1/' | sed -r 's/^ //'

Soln 3

Create a file say "myscript3" :

# sed comment - Increment a digit

#Assume the original string does not have the @character

s/3/@/g

s/2/3/g

s/1/2/g

s/0/1/g

s/@/0/g

[amittal@hills FileBased]$ echo "3ELB021" | sed -f myscript3

0ELB132

Soln 4

[amittal@hills awk]$ ifconfig | sed -n 2p

inet 147.144.1.2 netmask 255.255.255.0 broadcast 147.144.1.255

[amittal@hills awk]$ ifconfig | sed -n 2p | awk '{print $2}'

147.144.1.2

Soln 5

[amittal@hills awk]$

[amittal@hills awk]$ echo "123 654 Some" | awk '{ print $1, $2 , $1 + $2}'

123 654 777

[amittal@hills awk]$

Soln 6

[amittal@hills Review]$ ./awk1.sh

Testing for division by 2.

Number is even.

End of Test

Soln 7

Our first attempt is:

sed -r 's/\[com.*\]//'

However the ".*" keeps adding the string so we end up with the result as:

11/20/18 06:10:10.722 AM] [SEVERE] Current profile doesn't provide more RBSP data in PPS, skippingframe=29711 fps= 30 q=-1.0 size= 107756kB time=00:16:26.52 bitrate= 894.8kbits/s speed= 1x

[11/20/18 06:10:10.815 AM] [SEVERE] Current profile doesn't provide more RBSP data in PPS, skippingframe=61150 fps= 30 q=-1.0 size= 222011kB time=00:33:54.90 bitrate= 893.8kbits/s speed= 1x

The last "]" is considered and everything in between gets taken out. What we need is to take out only the first bracketed string.

sed -r 's/\[com[a-zA-Z0-9._]+\]//'

Soln 8

A first attempt using sed yields some interesting results:

echo "The cat cat eats the food" | sed -r 's/(.*) \1/\1/g'

There is a space between ")" and "\1" in the above string.

Yields:

[amittal@hills Exercises]$ echo "The cat cat eats the food" | sed -r 's/(.*) \1/\1/g'

Thecateatsthefood

The duplicate word got taken out but the spaces got taken out also. Remember ".*" means anything even nothing so when sed sees a blank space that matches the blank space between ")" and "\1" and the pattern match it assumes that ".*" matches nothing. So we have

nothing space nothing/ replace with nothing /

What we need is some other way to specify a word other than ".*" .

s/([^[:space:]]+) \1/\1/g

[amittal@hills Exercises]$ sed -r -f sed1 data.txt

The cat eats her food.

The dog chases the cat.

Milli Vanilli can lip sync.

Using awk to do the same thing.

{

str1=$1

for( i1=1 ; i1<NF; i1++ )

{

if( $i1 == $(i1+1) )

continue ; #skip

else

{

str1=str1 " " $(i1+1)

}

} #for

print str1

}

For each line we loop through the fields. How many fields do we have in a line ? We can use the built-in variable "NF" to determine that. We then check if the current field is equal to the next and if it's not then we add the next field to our line that we are building. At the end of the for loop we print the line.

[amittal@hills Exercises]$ awk -f awk1 data.txt

The cat eats her food.

The dog chases the cat.

Milli Vanilli can lip sync.

Girl you know it's true .

Taking out duplicate words that occur more than once can be done using awk and associative array.

{

for( i1 in myArray )

delete myArray[i1]

for( i1=1 ; i1<=NF; i1++ )

{

if( myArray[$i1] == 1 )

continue ; #skip

else

{

#print "Else"

myArray[$i1] = 1

if ( i1 == 1 )

str1=$i1

else

str1=str1 " " $i1

}

} #for

print str1

}

The "in array" notation clears out the entries in the array. If we don't find the word in the array then we add it to our string variable "str1" . We then print out the line. If we encounter the same word again we skip it.

Soln 9

To count the frequency of the words we can use the same approach with the associative array.

#This program will print how many times a word occurred from the input file

BEGIN{

SINGLE_QUOTE="\""

}

{

for( i1=1 ; i1<=NF; i1++ )

{

myArray[$i1] += 1

} #for

}

END {

for( i1 in myArray )

print "Word " SINGLE_QUOTE i1 SINGLE_QUOTE " occurs: " myArray[i1] " times"

}

Ex: "data1.txt"

The cat cat eats her cat food.

The dog chases the the the cat.

Milli Vanilli can lip lip sync lip lip.

Girl you know it's true true .

File: "freq1"

[amittal@hills Exercises]$ awk -f freq1 data1.txt

Word "lip." occurs: 1 times

Word "can" occurs: 1 times

Word "you" occurs: 1 times

Word "Vanilli" occurs: 1 times

Word "The" occurs: 2 times

...

References

http://www.grymoire.com/Unix/Sed.html

http://www.grymoire.com/Unix/Awk.html

Page updated

Google Sites

Report abuse