Friday, January 20, 2012

AWK-II


8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

  1. Variable which defines values which can be changed such as field separator and record separator.
  2. Variable which can be used for processing and reports such as Number of records, number of fields.

1. Awk FS Example: Input field separator variable.

Awk reads and parses each line from input based on whitespace character by default and set the variables $1,$2 and etc. Awk FS variable is used to set the field separator for each record. Awk FS can be set to any single character or regular expression. You can use input field separator using one of the following two options:
  1. Using -F command line option.
  2. Awk FS can be set like normal variable.
Syntax:

$ awk -F 'FS' 'commands' inputfilename

(or)

$ awk 'BEGIN{FS="FS";}'
  • Awk FS is any single character or regular expression which you want to use as a input field separator.
  • Awk FS can be changed any number of times, it retains its values until it is explicitly changed. If you want to change the field separator, its better to change before you read the line. So that change affects the line what you read.
Here is an awk FS example to read the /etc/passwd file which has “:” as field delimiter.
$ cat etc_passwd.awk
BEGIN{
FS=":";
print "Name\tUserID\tGroupID\tHomeDirectory";
}
{
 print $1"\t"$3"\t"$4"\t"$6;
}
END {
 print NR,"Records Processed";
}
$awk -f etc_passwd.awk /etc/passwd
Name    UserID  GroupID        HomeDirectory
gnats 41 41 /var/lib/gnats
libuuid 100 101 /var/lib/libuuid
syslog 101 102 /home/syslog
hplip 103 7 /var/run/hplip
avahi 105 111 /var/run/avahi-daemon
saned 110 116 /home/saned
pulse 111 117 /var/run/pulse
gdm 112 119 /var/lib/gdm
8 Records Processed

2. Awk OFS Example: Output Field Separator Variable

Awk OFS is an output equivalent of awk FS variable. By default awk OFS is a single space character. Following is an awk OFS example.
$ awk -F':' '{print $3,$4;}' /etc/passwd
41 41
100 101
101 102
103 7
105 111
110 116
111 117
112 119
Concatenator in the print statement “,” concatenates two parameters with a space which is the value of awk OFS by default. So, Awk OFS value will be inserted between fields in the output as shown below.
$ awk -F':' 'BEGIN{OFS="=";} {print $3,$4;}' /etc/passwd
41=41
100=101
101=102
103=7
105=111
110=116
111=117
112=119

3. Awk RS Example: Record Separator variable

Awk RS defines a line. Awk reads line by line by default.
Let us take students marks are stored in a file, each records are separated by double new line, and each fields are separated by a new line character.
$cat student.txt
Jones
2143
78
84
77

Gondrol
2321
56
58
45

RinRao
2122
38
37
65

Edwin
2537
78
67
45

Dayan
2415
30
47
20
Now the below Awk script prints the Student name and Rollno from the above input file.
$cat student.awk
BEGIN {
 RS="\n\n";
 FS="\n";

}
{
 print $1,$2;
}

$ awk -f student.awk  student.txt
Jones 2143
Gondrol 2321
RinRao 2122
Edwin 2537
Dayan 2415
In the script student.awk, it reads each student detail as a single record,because awk RS has been assigned to double new line character and each line in a record is a field, since FS is newline character.

4. Awk ORS Example: Output Record Separator Variable

Awk ORS is an Output equivalent of RS. Each record in the output will be printed with this delimiter. Following is an awk ORS example:
$  awk 'BEGIN{ORS="=";} {print;}' student-marks
Jones 2143 78 84 77=Gondrol 2321 56 58 45=RinRao 2122 38 37 65=Edwin 2537 78 67 45=Dayan 2415 30 47 20=
In the above script,each records in the file student-marks file is delimited by the character “=”.

5. Awk NR Example: Number of Records Variable

Awk NR gives you the total number of records being processed or line number. In the following awk NR example, NR variable has line number, in the END section awk NR tells you the total number of records in a file.
$ awk '{print "Processing Record - ",NR;}END {print NR, "Students Records are processed";}' student-marks
Processing Record -  1
Processing Record -  2
Processing Record -  3
Processing Record -  4
Processing Record -  5
5 Students Records are processed

6. Awk NF Example: Number of Fields in a record

Awk NF gives you the total number of fields in a record. Awk NF will be very useful for validating whether all the fields are exist in a record.
Let us take in the student-marks file, Test3 score is missing for to students as shown below.
$cat student-marks
Jones 2143 78 84 77
Gondrol 2321 56 58 45
RinRao 2122 38 37
Edwin 2537 78 67 45
Dayan 2415 30 47
The following Awk script, prints Record(line) number, and number of fields in that record. So It will be very simple to find out that Test3 score is missing.
$ awk '{print NR,"->",NF}' student-marks
1 -> 5
2 -> 5
3 -> 4
4 -> 5
5 -> 4

7. Awk FILENAME Example: Name of the current input file

FILENAME variable gives the name of the file being read. Awk can accept number of input files to process.
$ awk '{print FILENAME}' student-marks
student-marks
student-marks
student-marks
student-marks
student-marks
In the above example, it prints the FILENAME i.e student-marks for each record of the input file.

8. Awk FNR Example: Number of Records relative to the current input file

When awk reads from the multiple input file, awk NR variable will give the total number of records relative to all the input file. Awk FNR will give you number of records for each input file.
$ awk '{print FILENAME, FNR;}' student-marks bookdetails
student-marks 1
student-marks 2
student-marks 3
student-marks 4
student-marks 5
bookdetails 1
bookdetails 2
bookdetails 3
bookdetails 4
bookdetails 5
In the above example, instead of awk FNR, if you use awk NR, for the file bookdetails the you will get from 6 to 10 for each record.

7 Powerful Awk Operators Examples (Unary, Binary, Arithmetic, String, Assignment, Conditional, Reg-Ex Awk Operators)

Like any other programming language Awk also has lot of operators for number and string operations. In this article let us discuss about all the key awk operators. There are two types of operators in Awk.
  1. Unary Operator – Operator which accepts single operand is called unary operator.
  2. Binary Operator – Operator which accepts more than one operand is called binary operator.

Awk Unary Operator

OperatorDescription
+Positivate the number
-Negate the number
++AutoIncrement
AutoDecrement

Awk Binary Operator

There are different kinds of binary operators are available in Awk. It is been classified based on its usage.

Awk Arithmetic Opertors

The following operators are used for performing arithmetic calculations.
OperatorDescription
+Addition
-Subtraction
*Multiplication
/Division
%Modulo Division

Awk String Operator

For string concatenation Awk has the following operators.
OperatorDescription
(space)String Concatenation

Awk Assignment Operators

Awk has Assignment operator and Shortcut assignment operator as listed below.
OperatorDescription
=Assignment
+=Shortcut addition assignment
-=Shortcut subtraction assignment
*=Shortcut multiplication assignment
/=Shortcut division assignment
%=Shortcut modulo division assignment

Awk Conditional Operators

Awk has the following list of conditional operators which can be used with control structures and looping statement which will be covered in the coming article.
OperatorDescription
>Is greater than
>=Is greater than or equal to
<Is less than
<=Is less than or equal to
<=Is less than or equal to
==Is equal to
!=Is not equal to
&&Both the conditional expression should be true
||Any one of the conditional expression should be true

Awk Regular Expression Operator

OperatorDescription
~Match operator
!~No Match operator

Awk Operator Examples

Now let us review some examples that uses awk operators. Let us use /etc/passwd as input file in these examples.
$ cat /etc/passwd
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh
libuuid:x:100:101::/var/lib/libuuid:/bin/sh
syslog:x:101:102::/home/syslog:/bin/false
hplip:x:103:7:HPLIP system user,,,:/var/run/hplip:/bin/false
saned:x:110:116::/home/saned:/bin/false
pulse:x:111:117:PulseAudio daemon,,,:/var/run/pulse:/bin/false
gdm:x:112:119:Gnome Display Manager:/var/lib/gdm:/bin/false

Awk Example 1: Count the total number of fields in a file.

The below awk script, matches all the lines and keeps adding the number of fields in each line,using shortcut addition assignment operator. The number of fields seen so far is kept in a variable named ‘total’. Once the input has been processed, special pattern ‘END {…}’ is executed, which prints the total number of fields.
$ awk -F ':' '{ total += NF }; END { print total }' /etc/passwd
49

Awk Example 2: Count number of users who is using /bin/sh shell

In the below awk script, it matches last field of all lines containing the pattern /bin/sh. Regular expression should be closed between //. So all the frontslash(/) has to be escaped in the regular expression. When a line matches variable ‘n’ gets incremented by one. Printed the value of the ‘n’ in the END section.
$ awk -F ':' '$NF ~ /\/bin\/sh/ { n++ }; END { print n }' /etc/passwd
2

Awk Example 3: Find the user details who is having the highest USER ID

The below awk script, keeps track of the largest number in the field in variable ‘maxuid’ and the corresponding line will be stored in variable ‘maxline’. Once it has looped over all lines, it prints them out.
$ awk -F ':'  '$3 > maxuid { maxuid=$3; maxline=$0 }; END { print maxuid, maxline }' /etc/passwd
112 gdm:x:112:119:Gnome Display Manager:/var/lib/gdm:/bin/false

Awk Example 4: Print the even-numbered lines

The below awk script, processes each line and checks NR % 2 ==0 i.e if NR is multiples of 2. It performs the default operation which printing the whole line.
$ awk 'NR % 2 == 0' /etc/passwd
libuuid:x:100:101::/var/lib/libuuid:/bin/sh
hplip:x:103:7:HPLIP system user,,,:/var/run/hplip:/bin/false
pulse:x:111:117:PulseAudio daemon,,,:/var/run/pulse:/bin/false

Awk Example 5.Print every line which has the same USER ID and GROUP ID

The below awk script prints the line only if $3(USER ID) an $4(GROUP ID) are equal. It checks this condition for each line of input, if it matches, prints the whole line.
$awk -F ':' '$3==$4' passwd.txt
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh

Awk Example 6: Print user details who has USER ID greater than or equal to 100 and who has to use /bin/sh

In the below Awk statement, there are two conditional expression one is User id($3) greater than or equal to 100, and second is last field should match with the /bin/sh , ‘&&’ is to print only if both the above conditions are true.
$ awk -F ':' '$3>=100 && $NF ~ /\/bin\/sh/' passwd.txt
libuuid:x:100:101::/var/lib/libuuid:/bin/sh

Awk Example 7: Print user details who doesn’t have the comments in /etc/passwd file

The below Awk script, reads each line and checks for fifth field is empty, if it is empty, it prints the line.
$awk -F ':' '$5 == "" ' passwd.txt
libuuid:x:100:101::/var/lib/libuuid:/bin/sh
syslog:x:101:102::/home/syslog:/bin/false
saned:x:110:116::/home/saned:/bin/false

4 Awk If Statement Examples ( if, if else, if else if, :? )

In this awk tutorial, let us review awk conditional if statements with practical examples.
Awk supports lot of conditional statements to control the flow of the program. Most of the Awk conditional statement syntax are looks like ‘C’ programming language.
Normally conditional statement checks the condition, before performing any action. If the condition is true action(s) are performed. Similarly action can be performed if the condition is false.
Conditional statement starts with the keyword called ‘if’. Awk supports two different kind of if statement.
  1. Awk Simple If statement
  2. Awk If-Else statement
  3. Awk If-ElseIf-Ladder

Awk Simple If Statement

Single Action: Simple If statement is used to check the conditions, if the condition returns true, it performs its corresponding action(s).
Syntax:
if (conditional-expression)
 action
  • if is a keyword
  • conditional-expression – expression to check conditions
  • action – any awk statement to perform action.
Multiple Action: If the conditional expression returns true, then action will be performed. If more than one action needs to be performed, the actions should be enclosed in curly braces, separating them into a new line or semicolon as shown below.
Syntax:
if (conditional-expression)
{
 action1;
 action2;
}
If the condition is true, all the actions enclosed in braces will be performed in the given order. After all the actions are performed it continues to execute the next statements.

Awk If Else Statement

In the above simple awk If statement, there is no set of actions in case if the condition is false. In the awk If Else statement you can give the list of action to perform if the condition is false. If the condition returns true action1 will be performed, if the condition is false action 2 will be performed.
Syntax:
if (conditional-expression)
 action1
else
 action2
Awk also has conditional operator i.e ternary operator ( ?: ) whose feature is similar to the awk If Else Statement. If the conditional-expression is true, action1 will be performed and if the conditional-expression is false action2 will be performed.
Syntax:

conditional-expression ? action1 : action2 ;

Awk If Else If ladder

if(conditional-expression1)
 action1;
else if(conditional-expression2)
 action2;
else if(conditional-expression3)
 action3;
 .
 .
else
 action n;
  • If the conditional-expression1 is true then action1 will be performed.
  • If the conditional-expression1 is false then conditional-expression2 will be checked, if its true, action2 will be performed and goes on like this. Last else part will be performed if none of the conditional-expression is true.
Now let us create the sample input file which has the student marks.
$cat student-marks
Jones 2143 78 84 77
Gondrol 2321 56 58 45
RinRao 2122 38 37
Edwin 2537 87 97 95
Dayan 2415 30 47

1. Awk If Example: Check all the marks are exist

$ awk '{
if ($3 =="" || $4 == "" || $5 == "")
 print "Some score for the student",$1,"is missing";'
}' student-marks
Some score for the student RinRao is missing
Some score for the student Dayan is missing
$3, $4 and $5 are test scores of the student. If test score is equal to empty, it throws the message. || operator is to check any one of marks is not exist, it should alert.

2. Awk If Else Example: Generate Pass/Fail Report based on Student marks in each subject

$ awk '{
if ($3 >=35 && $4 >= 35 && $5 >= 35)
 print $0,"=>","Pass";
else
 print $0,"=>","Fail";
}' student-marks
Jones 2143 78 84 77 => Pass
Gondrol 2321 56 58 45 => Pass
RinRao 2122 38 37 => Fail
Edwin 2537 87 97 95 => Pass
Dayan 2415 30 47 => Fail
The condition for Pass is all the test score mark should be greater than or equal to 35. So all the test scores are checked if greater than 35, then it prints the whole line and string “Pass”, else i.e even if any one of the test score doesn’t meet the condition, it prints the whole line and prints the string “Fail”.

3. Awk If Else If Example: Find the average and grade for every student

$ cat grade.awk
{
total=$3+$4+$5;
avg=total/3;
if ( avg >= 90 ) grade="A";
else if ( avg >= 80) grade ="B";
else if (avg >= 70) grade ="C";
else grade="D";

print $0,"=>",grade;
}
$ awk -f grade.awk student-marks
Jones 2143 78 84 77 => C
Gondrol 2321 56 58 45 => D
RinRao 2122 38 37 => D
Edwin 2537 87 97 95 => A
Dayan 2415 30 47 => D
In the above awk script, the variable called ‘avg’ has the average of the three test scores. If the average is greater than or equal to 90, then grade is A, or if the average is greater than or equal to 80 then grade is B, if the average is greater than or equal to 70, then the grade is C. Or else the grade is D.

4. Awk Ternary ( ?: ) Example: Concatenate every 3 lines of input with a comma.

$ awk 'ORS=NR%3?",":"\n"' student-marks
Jones 2143 78 84 77,Gondrol 2321 56 58 45,RinRao 2122 38 37
Edwin 2537 87 97 95,Dayan 2415 30 47,

Caught In the Loop? Awk While, Do While, For Loop, Break, Continue, Exit Examples

In this article, let us review about awk loopstatements – while, do while, for loops, break, continue, and exit statements along with 7 practical examples. Awk looping statements are used for performing set of actions again and again in succession. It repeatedly executes a statement as long as condition is true. Awk has number of looping statement as like ‘C’ programming language.

Awk While Loop

Syntax:

while(condition)
 actions
  • while is a keyword.
  • condition is conditional expression
  • actions are body of the while loop which can have one or more statement. If actions has more than one statement, it has to be enclosed with in the curly braces.
How it works? — Awk while loop checks the condition first, if the condition is true, then it executes the list of actions. After action execution has been completed, condition is checked again, and if it is true, actions is performed again. This process repeats until condition becomes false. If the condition returns false in the first iteration then actions are never executed.

1. Awk While Loop Example: Create a string of a specific length

$awk 'BEGIN { while (count++<50) string=string "x"; print string }'
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
The above example uses the ‘BEGIN { }’ special block that gets executed before anything else in an Awk program. In this block, awk while loop appends character ‘x’ to variable ‘string’ 50 times. count is a variable which gets incremented and checked it is less than 50. So after 50 iteration, this condition becomes false. After it has looped, the ‘string’ variable gets printed out. As this Awk program does not have a body, it quits after executing the BEGIN block.

Awk Do-While Loop

How it works? – Awk Do while loop is called exit controlled loop, whereas awk while loop is called as entry controlled loop. Because while loop checks the condition first, then it decides to execute the body or not. But the awk do while loop executes the body once, then repeats the body as long as the condition is true.
Syntax:

do
action
while(condition)
Even if the condition is false, at the beginning action is performed at least once.

2. Awk Do While Loop Example: Print the message at least once

$ awk 'BEGIN{
count=1;
do
print "This gets printed at least once";
while(count!=1)
}'
This gets printed at least once
In the above script, the print statement, executed at least once, if you use the while statement, first the condition will be checked after the count is initialized to 1, at first iteration itself the condition will be false,so print statement won’t get executed, but in do while first body will be executed, so it executes print statement.

Awk For Loop Statement

Awk for statement is same as awk while loop, but it is syntax is much easier to use.
Syntax:

for(initialization;condition;increment/decrement)
actions
How it works? — Awk for statement starts by executing initialization, then checks the condition, if the condition is true, it executes the actions, then increment or decrement.Then as long as the condition is true, it repeatedly executes action and then increment/decrement.

3. Awk For Loop Example . Print the sum of fields in all lines.

$ awk '{ for (i = 1; i <= NF; i++) total = total+$i }; END { print total }'
12 23 34 45 56
34 56 23 45 23
351
Initially the variable i is initialized to 1, then checks if i is lesser or equal to total number of fields, then it keeps on adding all the fields and finally the addition is stored in the variable total. In the END block just print the variable total.

4. Awk For Loop Example: Print the fields in reverse order on every line.

$ awk 'BEGIN{ORS="";}{ for (i=NF; i>0; i--) print $i," "; print "\n"; }' student-marks
77  84  78  2143  Jones
45  58  56  2321  Gondrol
37  38  2122  RinRao
95  97  87  2537  Edwin
47  30  2415  Dayan
We discussed about awk NF built-in variable in our previous article. After processing each line, Awk sets the NF variable to number of fields found on that line.
The above script,loops in reverse order starting from NF to 1 and outputs the fields one by one. It starts with field $NF, then $(NF-1),…, $1. After that it prints a newline character.
Now let us see some other statements which can be used with looping statement.

Awk Break statement

Break statement is used for jumping out of the innermost looping (while,do-while and for loop) that encloses it.

5. Awk Break Example: Awk Script to go through only 10 iteration

$ awk 'BEGIN{while(1) print "forever"}'
The above awk while loop prints the string “forever” forever, because the condition never get fails. Now if you want to stop the loop after first 10 iteration, see the below script.
$ awk 'BEGIN{
x=1;
while(1)
{
print "Iteration";
if ( x==10 )
break;
x++;
}}'
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
In the above script, it checks the value of the variable “x”, if it reaches 10 just jumps out of the loop using break statement.

Awk Continue statement

Continue statement skips over the rest of the loop body causing the next cycle around the loop to begin immediately.

6. Awk Continue Example: Execute the loop except 5th iteration

$ awk 'BEGIN{
x=1;
while(x<=10)
{
if(x==5){
x++;
continue;
}
print "Value of x",x;x++;
}
}'
Value of x 1
Value of x 2
Value of x 3
Value of x 4
Value of x 6
Value of x 7
Value of x 8
Value of x 9
Value of x 10
In the above script, it prints value of x, at each iteration, but if the value of x reaches 5, then it just increment the value of x, then continue with the next iteration, it wont execute the rest body of the loop, so that value of x is not printed for the value 5. Continue statement is having the meaning only if you use with in the loop.

Awk Exit statement

Exit statement causes the script to immediately stop executing the current input and to stop processing input all the remaining input is ignored.
Exit accepts any integer as an argument which will be the exit status code for the awk process. If no argument is supplied, exit returns status zero.

7. Awk Exit Example: Exit from the loop at 5th iteration

$ awk 'BEGIN{
x=1;
while(x<=10)
{
if(x==5){
exit;}
print "Value of x",x;x++;
}
}'
Value of x 1
Value of x 2
Value of x 3
Value of x 4
In the above script, once the value of x reaches 5, it calls exit, which stops the execution of awk process. So the value of x is printed only till 4, once it reaches 5 it exits.