Introduction to AWK

Awk, named after its developers Aho, Weinberger, and Kernighan, is a column oriented programming language which permits easy manipulation of structured data and the generation of formatted reports. The awk utility is a pattern scanning and processing language. It searches one or more files to see if they contain lines that match specified patterns and then perform associated actions , such as writing the line to the standard output or incrementing a counter each time it finds a match.

Awk breaks each line of input passed to it into fields. By default, a field is a string of consecutive characters separated by whitespace, though there are options for changing the delimiter. Awk parses and operates on each separate field. This makes awk ideal for handling structured text files, especially tables, data organized into consistent chunks, such as rows and columns.

awk '{print $1 $5 $6}' $filename

# Prints fields #1, #5, and #6 of file $filename

We have just seen the awk print command in action. The only other feature of awk we need to deal with here is variables. Awk handles variables similarly to shell scripts, though a bit more flexibly.

total += $6

This adds the value of column 6 to the running total of "total". Finally, to print "total", there is an END command block, executed after the script has processed all its input.

END { print total }

Corresponding to the END, there is a BEGIN, for a code block to be performed before awk starts processing its input.

Eample awk script:

mark.txt

ec john 9

ec joseph 6

ec alex 7

cs mobin 8

cs joby 8

ec amal 7

cs christin 6

cs sandeep 5

ec abin 7

Program to find Total points scored by cs students.

test.awk

#!/usr/bin/awk -f

#FIND TOTAL POINTS SCORED BY CS STUDENTS STORE IT IN totalpoint.txt

BEGIN {

totalpoint=0

}

{

if ($1 == "cs")

{

totalpoint += $3

}

}

END {

print "Total points scored by cs students: "totalpoint > "totalpoint.txt"

}

Execute the program by running the folllowing command in terminal

$./test.awk mark.txt

This will print the total points to the totalpoint.txt as specified in the END block.