Museum

Home

Lab Overview

Retrotechnology Articles

⇒ Online Manual

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

grep(1)

sed(1)

printf(3)

awk(1)  —  Commands

OSF

NAME

awk − Finds lines in files and makes specified changes to them

SYNOPSIS

awk [-Fcharacter] -f program [file ...]

awk [-Fcharacter] statement ...  [file ...]

The awk command is a more powerful pattern-matching command than the grep command.  It can perform limited processing on the input lines, instead of simply displaying lines that match. 

FLAGS

-Fcharacter
Uses character as the field separator character (any white space by default).  The -F flag must precede any other command-line arguments. 

-f program
Searches for the patterns and performs the actions found in the file program. 

DESCRIPTION

The awk command provides a flexible text-manipulation language suitable for simple report generation.  awk is a more powerful tool for text manipulation than either sed or grep. 

The awk command:

       •Performs convenient numeric processing. 

       •Allows variables within actions. 

       •Allows general selection of patterns. 

       •Allows control flow in the actions. 

       •Does not require any compiling of programs. 

Pattern-matching and action statements can be specified either on the command line or in a program file.  In either case, the awk command first reads all matching and action statements, then it reads a line of its input and compares it to each specified pattern.  If the line matches a specified pattern, awk performs the specified actions and writes the result to standard output.  When it has compared the current input line to all patterns, it reads the next line. 

The awk command reads input files in the order stated on the command line.  If you specify a filename as a - (dash) or do not specify a filename, awk reads standard input. 

Enclose pattern-action statements on the command line in ” (single quotes) to protect them from interpretation by the shell.  Consecutive pattern-action statements on the same command line must be separated by a ; (semicolon), within one set of quotes. 

You can assign values to variables on the awk command line as follows:

variable=value

The awk command treats input lines as fields separated by spaces, tabs, or a field separator you set with the FS variable.  (Two spaces are recognized as a single separator.)  Fields are referenced as $1, $2, and so on.  $0 refers to the entire line. 

Pattern-Action Statements

Pattern-action statements follow the form:

pattern {action}

If a pattern lacks a corresponding action, awk writes the entire line that contains the pattern to standard output.  If an action lacks a corresponding pattern, awk applies it to every line. 

Actions

An action is a sequence of statements that follow C language syntax.  These statements can include:

if (expression) statement [ else statement ]

while (expression) statement

for (expression;expression;expression) statement

for (variable in array) statement

break

continue

{ [ statement ... ] }

variable=expression

print [ expression_list ] [ >file | >>file ] [ | command ]

printf format[ ,expression_list ] [ >file | >>file  ] [ | command ]

next

exit [ expression ]

Statements can end with a semicolon, a newline character, or the right brace enclosing the action. 

Expressions can have string or numeric values and are built using the operators +, -, ∗, /, %, a space for string concatenation, and the C operators ++, --, +=, -=, ∗=, /=, %=, ^=, ∗=, >, >=, <, <=, ==, and !=. 

Because the actions process fields, input white space is not preserved in the output. 

The file and command arguments can be literal names or expressions enclosed in quotation marks ("").  Identical string values in different statements refer to the same open file. 

The print statement writes its arguments to standard output (or to a file if > file or >> file is present), separated by the current output field separator and terminated by the current output record separator. 

The printf statement formats its expression list according to the format of the printf() subroutine, and writes it arguments to standard output, separated by the output field separator and terminated by the output record separator.  You can redirect the output into a file using the print ... > file or printf ( ... ) > file statements. 

Variables

Variables can be scalars, array elements (denoted x[i]), or fields. 

Variable names can consist of uppercase and lowercase alphabetic letters, the underscore character, and the digits (0 to 9).  Variable names cannot begin with a digit. 

Variables are initialized to the null string.  Array subscripts can be any string; they do not have to be numeric.  This allows for a form of associative memory.  Enclose string constants in expressions in "" (double quotes). 

There are several variables with special meaning to awk.  They include:

FSInput field separator (default is a space).  If it is a space, then any number of spaces and tabs can separate fields. 

NFThe number of fields in the current input line (record), with a limit of 99. 

NRThe number of the current input line (record). 

FILENAME
The name of the current input file.

RSInput record separator (default is a newline character). 

OFSThe output field separator (default is a space). 

ORSThe output record separator (default is a newline character). 

OFMTThe output format for numbers (default % .6g). 

Functions

There are several built-in functions that can be used in awk actions.  (For information about regular expressions as referred to in this section, see grep.) 

length(argument)
Returns the length, in characters, of argument, or of the entire line if there is no argument. 

exp(number)
Takes the exponential of its argument.

log(number)
Takes the base e logarithm of its argument.

sqrt(number)
Takes the square root of its argument.

int(number)
Takes the integer part of its argument.

substr(string,position,number)
Returns the substring number characters long of string, beginning at position. 

index(string,string2)
Returns the position in string where string2 occurs, or 0 if it does not occur. 

split(string,a,[regular_expression])
Splits string into array elements a[1], a[2], . . ., a[number], and returns number. The separation is done with the specified regular expression or with the FS field separator if regular_expression is not given. 

sprintf(fmt,expression1,expression2, ...)
Formats the expressions according to the printf format string fmt and returns the resulting string. 

The getline function sets $0 to the next input record from the current input file.  The getline function returns 1 for a successful input and 0 for End-of-File. 

Patterns

Patterns are arbitrary Boolean combinations of patterns and relational expressions (the !, |, and & operators and parentheses for grouping).  You must start and end regular expressions with slashes.  You can use regular expressions as described for grep, including the following special characters:

+One or more occurrences of the pattern. 

?Zero or one occurrence of the pattern. 

⏐Either of two statements. 

( )Grouping of expressions. 

Isolated regular expressions in a pattern apply to the entire line.  Regular expressions can occur in relational expressions.  Any string (constant or variable) can be used as a regular expression, except in the position of an isolated regular expression in a pattern. 

If two patterns are separated by a comma, the action is performed on all lines between an occurrence of the first pattern and the next occurrence of the second. 

There are two types of relational expressions that you can use.  The first type has the form:

expression  match_operator  pattern

where match_operator is either: ~ (for contains) or !~ (for does not contain). 

The second type has the form:

expression  relational_operator  expression

where relational_operator is any of the six C relational operators: <, >, <=, >=, ==, and !=.  A conditional can be an arithmetic expression, a relational expression, or a Boolean combination of these. 

You can use the BEGIN and END special patterns to capture control before the first and after the last input line is read, respectively.  BEGIN must be the first pattern; END must be the last.  BEGIN and END do not combine with other patterns. 

You have two ways to designate a character other than white space to separate fields.  You can use the -Fcharacter flag on the command line, or you can start program with the following sequence:

BEGIN { FS = c }

Either action changes the field separator to c. 

There are no explicit conversions between numbers and strings.  To force an expression to be treated as a number, add 0 to it.  To force it to be treated as a string, append a null string (""). 

EXAMPLES

     1.To display the lines of a file that are longer than 72 bytes, enter:

awk  ’length  >72’  chapter1

This selects each line of the file chapter1 that is longer than 72 bytes.  awk then writes these lines to standard output because no action is specified. 

     2.To display all lines between the words start and stop, enter:

awk  ’/start/,/stop/’  chapter1

     3.To run an awk program (sum2.awk) that processes a file (chapter1), enter:

awk  -f  sum2.awk  chapter1

     4.The following awk program computes the sum and average of the numbers in the second column of the input file:

{
sum += $2
}
END{
print "Sum: ", sum;
print "Average:", sum/NR;
}

The first action adds the value of the second field of each line to the sum variable.  (awk initializes sum, and all variables, to 0 (zero) before starting.)  The keyword END before the second action causes awk to perform that action after all of the input file is read.  The NR variable, which is used to calculate the average, is a special variable containing the number of records (lines) that were read. 

     5.To print the names of the users who have the C shell as the initial shell, enter:

awk  -F: ’$7 ~ /csh/ {print $1}’ /etc/passwd

     6.To print the first two fields in reversed order, enter:

awk ’{ print $2, $1 }’

     7.The following awk program prints the first two fields of the input file in reversed order, with input fields separated by a comma and a space, then adds up the first column and prints the sum and average:

BEGIN{ FS = ",[ ]∗|[ ]+" }
{ print $2, $1}
{ s += $1 }
END{ print "sum is", s, "average is", s/NR }

RELATED INFORMATION

Commands:  grep(1), sed(1). 

Functions:  printf(3). 

Guide to Programming Support Tools

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026