LEX(1) DOMAIN/IX SYS5 LEX(1)
NAME
lex - generate programs for simple lexical tasks
USAGE
lex [ -rctvn ] [ file ] ...
DESCRIPTION
Lex generates programs for simple lexical analysis of text.
The input files contain strings and expressions to be
searched for, and C text to be executed when strings are
found. Lex treats multiple files as a single file. If no
files are specified, it uses standard input.
Lex generates a C source program named lex.yy.c which, when
loaded with the library, copies the input to the output,
except where it encounters a specified string in the file
being analyzed. The program then executes corresponding
program text. The matching string remains in yytext, an
external character array. Matching is done in the order
that the strings appear in the file.
RULES
Strings may contain square brackets to indicate character
classes, as in [ abx-z ] to indicate a, b, x, y, and z.
The operators *, +, and ? respectively signify any non-
negative number of, any positive number of, and either zero
or one occurrence of, the previous character or character
class.
The period (.) is the class of all ASCII characters except
newline.
Using parentheses for grouping and a vertical bar for alter-
nation is allowed.
The notation r{d,e} in a rule indicates between d and e
instances of regular expression r. It has higher precedence
than a pipe (|), but lower precedence than the *, ?, and +
characters, and concatenation.
The caret (^) at the beginning of an expression permits a
successful match only immediately after a newline. A dollar
sign ($) at the end of an expression requires a trailing
newline.
Printed 12/4/86 LEX-1
LEX(1) DOMAIN/IX SYS5 LEX(1)
The slash (/) in an expression indicates trailing context;
only the part of the expression up to the slash is returned
in yytext, but the remainder of the expression must follow
in the input stream.
An operator character may be used as an ordinary symbol if
it is within double quotes (`` '') or preceded by a
backslash (\). Thus, [a-zA-Z]+ matches a string of letters.
Three subroutines defined as macros are expected. They are:
input() reads a character
unput(c) replaces a character read
output(c) places an output character
These subroutines are defined in terms of the standard
streams, but you can override them. The program generated
is named yylex(), and the library contains a main() that
calls it. The action ``REJECT'' on the right side of the
rule causes this match to be rejected and the next suitable
match executed. The function yymore() accumulates addi-
tional characters into the same yytext. The function
yyless(p) pushes back the portion of the string matched
beginning at p, which should be between yytext and
yytext+yyleng. The macros input and output use files yyin
and yyout to read from and write to, defaulted to stdin and
stdout, respectively.
Any line beginning with a blank is assumed to contain only C
text and it is copied. If it precedes %%, it is copied into
the external definition area of the lex.yy.c file. All
rules should follow a double percent sign (%%) as in
yacc(1). Lines preceding %% and beginning with a nonblank
character define the string on the left to be the remainder
of the line; it can be called out later by surrounding it
with braces ({ }). Note that braces do not imply
parentheses; only string substitution is done.
The external names generated by lex all begin with the pre-
fix yy or YY.
Certain table sizes for the resulting finite state machine
can be set in the definitions section:
%p n number of positions is n (default 2000)
%n n number of states is n (500)
LEX-2 Printed 12/4/86
LEX(1) DOMAIN/IX SYS5 LEX(1)
%t n number of parse tree nodes is n (1000)
%a n number of transitions is n (3000)
Using one or more of the above automatically implies the -v
option, unless the -n option is used.
OPTIONS
-r Specify RATFOR actions.
-c Indicate C actions (default).
-t Write the result of the lexical analysis on stan-
dard output, instead of in file lex.yy.c
(default).
-v Provide a one-line summary of statistics of the
generated analyzer.
-n Suppress printing of the one-line summary men-
tioned in the -v option (default).
EXAMPLE
D [0-9]
%%
if printf("IF statement\n");
[a-z]+ printf("tag, value %s\n",yytext);
0{D}+ printf("octal number %s\n",yytext);
{D}+ printf("decimal number %s\n",yytext);
"++" printf("unary op\n");
"+" printf("binary op\n");
"/*" { loop:
while (input() != '*');
switch (input())
{
case '/': break;
case '*': unput('*');
default: go to loop;
}
}
CAUTIONS
The -r option is not yet fully operational.
RELATED INFORMATION
yacc(1), malloc(3C).
Printed 12/4/86 LEX-3