TH AWK 1
SH NAME
awk \- pattern-directed scanning and processing language
SH SYNOPSIS
B awk
[
BI -F fs
]
[
BI -v
I var=value
]
[
BI -mr n
]
[
BI -mf n
]
[
B -f
I prog
[
I prog
]
[
I file ...
]
SH DESCRIPTION
I Awk
scans each input
I file
for lines that match any of a set of patterns specified literally in
IR prog
or in one or more files
specified as
B -f
IR file .
With each pattern
there can be an associated action that will be performed
when a line of a
I file
matches the pattern.
Each line is matched against the
pattern portion of every pattern-action statement;
the associated action is performed for each matched pattern.
The file name
L -
means the standard input.
Any
IR file
of the form
I var=value
is treated as an assignment, not a file name,
and is executed at the time it would have been opened if it were a file name.
The option
B -v
followed by
I var=value
is an assignment to be done before
I prog
is executed;
any number of
B -v
options may be present.
B \-F
IR fs
option defines the input field separator to be the regular expression
IR fs .
PP
An input line is normally made up of fields separated by white space,
or by regular expression
BR FS .
The fields are denoted
BR $1 ,
BR $2 ,
\&..., while
B $0
refers to the entire line.
If
BR FS
is null, the input line is split into one field per character.
PP
To compensate for inadequate implementation of storage management,
the
B \-mr
option can be used to set the maximum size of the input record,
and the
B \-mf
option to set the maximum number of fields.
PP
A pattern-action statement has the form
IP
IB pattern " { " action " }
PP
A missing
BI { " action " }
means print the line;
a missing pattern always matches.
Pattern-action statements are separated by newlines or semicolons.
PP
An action is a sequence of statements.
A statement can be one of the following:
PP
EX
ta \w'\fLdelete array[expression]'u
if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
while(\fI expression \fP)\fI statement\fP
for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
for(\fI var \fPin\fI array \fP)\fI statement\fP
do\fI statement \fPwhile(\fI expression \fP)
break
continue
{\fR [\fP\fI statement ... \fP\fR] \fP}
\fIexpression\fP        #\fR commonly\fP\fI var = expression\fP
print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
return\fR [ \fP\fIexpression \fP\fR]\fP
next    #\fR skip remaining patterns on this input line\fP
nextfile        #\fR skip rest of this file, open next, start at top\fP
delete\fI array\fP[\fI expression \fP]  #\fR delete an array element\fP
delete\fI array\fP      #\fR delete all elements of array\fP
exit\fR [ \fP\fIexpression \fP\fR]\fP   #\fR exit immediately; status is \fP\fIexpression\fP
EE
DT
PP
Statements are terminated by
semicolons, newlines or right braces.
An empty
I expression-list
stands for
BR $0 .
String constants are quoted \&\fL"\ "\fR,
with the usual C escapes recognized within.
Expressions take on string or numeric values as appropriate,
and are built using the operators
B + \- * / % ^
(exponentiation), and concatenation (indicated by white space).
The operators
B
! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
are also available in expressions.
Variables may be scalars, array elements
(denoted
IB x  [ i ] )
or fields.
Variables are initialized to the null string.
Array subscripts may be any string,
not necessarily numeric;
this allows for a form of associative memory.
Multiple subscripts such as
B [i,j,k]
are permitted; the constituents are concatenated,
separated by the value of
BR SUBSEP .
PP
The
B print
statement prints its arguments on the standard output
(or on a file if
BI > file
or
BI >> file
is present or on a pipe if
BI | cmd
is present), separated by the current output field separator,
and terminated by the output record separator.
I file
and
I cmd
may be literal names or parenthesized expressions;
identical string values in different statements denote
the same open file.
The
B printf
statement formats its expression list according to the format
(see
IR fprintf (2)) .
The built-in function
BI close( expr )
closes the file or pipe
IR expr .
The built-in function
BI fflush( expr )
flushes any buffered output for the file or pipe
IR expr .
PP
The mathematical functions
BR exp ,
BR log ,
BR sqrt ,
BR sin ,
BR cos ,
and
BR atan2
are built in.
Other built-in functions:
TF length
TP
B length
the length of its argument
taken as a string,
or of
B $0
if no argument.
TP
B rand
random number on (0,1)
TP
B srand
sets seed for
B rand
and returns the previous seed.
TP
B int
truncates to an integer value
TP
B utf
converts its numerical argument, a character number, to a
SM UTF
string
TP
BI substr( s , " m" , " n\fL)
the
IR n -character
substring of
I s
that begins at position
IR m
counted from 1.
TP
BI index( s , " t" )
the position in
I s
where the string
I t
occurs, or 0 if it does not.
TP
BI match( s , " r" )
the position in
I s
where the regular expression
I r
occurs, or 0 if it does not.
The variables
B RSTART
and
B RLENGTH
are set to the position and length of the matched string.
TP
BI split( s , " a" , " fs\fL)
splits the string
I s
into array elements
IB a [1]\f1,
IB a [2]\f1,
\&...,
IB a [ n ]\f1,
and returns
IR n .
The separation is done with the regular expression
I fs
or with the field separator
B FS
if
I fs
is not given.
An empty string as field separator splits the string
into one array element per character.
TP
BI sub( r , " t" , " s\fL)
substitutes
I t
for the first occurrence of the regular expression
I r
in the string
IR s .
If
I s
is not given,
B $0
is used.
TP
B gsub
same as
B sub
except that all occurrences of the regular expression
are replaced;
B sub
and
B gsub
return the number of replacements.
TP
BI sprintf( fmt , " expr" , " ...\fL)
the string resulting from formatting
I expr ...
according to the
I printf
format
I fmt
TP
BI system( cmd )
executes
I cmd
and returns its exit status
TP
BI tolower( str )
returns a copy of
I str
with all upper-case characters translated to their
corresponding lower-case equivalents.
TP
BI toupper( str )
returns a copy of
I str
with all lower-case characters translated to their
corresponding upper-case equivalents.
PD
PP
The ``function''
B getline
sets
B $0
to the next input record from the current input file;
B getline
BI < file
sets
B $0
to the next record from
IR file .
B getline
I x
sets variable
I x
instead.
Finally,
IB cmd " | getline
pipes the output of
I cmd
into
BR getline ;
each call of
B getline
returns the next line of output from
IR cmd .
In all cases,
B getline
returns 1 for a successful input,
0 for end of file, and \-1 for an error.
PP
Patterns are arbitrary Boolean combinations
(with
BR "! || &&" )
of regular expressions and
relational expressions.
Regular expressions are as in
IR regexp (6).
Isolated regular expressions
in a pattern apply to the entire line.
Regular expressions may also occur in
relational expressions, using the operators
BR ~
and
BR !~ .
BI / re /
is a constant regular expression;
any string (constant or variable) may be used
as a regular expression, except in the position of an isolated regular expression
in a pattern.
PP
A pattern may consist of two patterns separated by a comma;
in this case, the action is performed for all lines
from an occurrence of the first pattern
though an occurrence of the second.
PP
A relational expression is one of the following:
IP
I expression matchop regular-expression
br
I expression relop expression
br
IB expression " in " array-name
br
BI ( expr , expr,... ") in " array-name
PP
where a
I relop
is any of the six relational operators in C,
and a
I matchop
is either
B ~
(matches)
or
B !~
(does not match).
A conditional is an arithmetic expression,
a relational expression,
or a Boolean combination
of these.
PP
The special patterns
B BEGIN
and
B END
may be used to capture control before the first input line is read
and after the last.
B BEGIN
and
B END
do not combine with other patterns.
PP
Variable names with special meanings:
TF FILENAME
TP
B CONVFMT
conversion format used when converting numbers
(default
BR "%.6g" )
TP
B FS
regular expression used to separate fields; also settable
by option
BI \-F fs\f1.
TP
BR NF
number of fields in the current record
TP
B NR
ordinal number of the current record
TP
B FNR
ordinal number of the current record in the current file
TP
B FILENAME
the name of the current input file
TP
B RS
input record separator (default newline)
TP
B OFS
output field separator (default blank)
TP
B ORS
output record separator (default newline)
TP
B OFMT
output format for numbers (default
BR "%.6g" )
TP
B SUBSEP
separates multiple subscripts (default 034)
TP
B ARGC
argument count, assignable
TP
B ARGV
argument array, assignable;
non-null members are taken as file names
TP
B ENVIRON
array of environment variables; subscripts are names.
PD
PP
Functions may be defined (at the position of a pattern-action statement) thus:
IP
L
function foo(a, b, c) { ...; return x }
PP
Parameters are passed by value if scalar and by reference if array name;
functions may be called recursively.
Parameters are local to the function; all other variables are global.
Thus local variables may be created by providing excess parameters in
the function definition.
SH EXAMPLES
TP
L
length($0) > 72
Print lines longer than 72 characters.
TP
L
{ print $2, $1 }
Print first two fields in opposite order.
PP
EX
BEGIN { FS = ",[ \et]*|[ \et]+" }
     { print $2, $1 }
EE
ns
IP
Same, with input fields separated by comma and/or blanks and tabs.
PP
EX
       { s += $1 }
END     { print "sum is", s, " average is", s/NR }
EE
ns
IP
Add up first column, print sum and average.
TP
L
/start/, /stop/
Print all lines between start/stop pairs.
PP
EX
BEGIN   {       # Simulate echo(1)
       for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
       printf "\en"
       exit }
EE
SH SOURCE
B /sys/src/cmd/awk
SH SEE ALSO
IR sed (1),
IR regexp (6),
br
A. V. Aho, B. W. Kernighan, P. J. Weinberger,
I
The AWK Programming Language,
Addison-Wesley, 1988.  ISBN 0-201-07981-X
SH BUGS
There are no explicit conversions between numbers and strings.
To force an expression to be treated as a number add 0 to it;
to force it to be treated as a string concatenate
\&\fL""\fP to it.
br
The scope rules for variables in functions are a botch;
the syntax is worse.